Decompose Step 6 snapshot: 140 task specs + contract docs

Closes out greenfield Step 6 (Decompose) for all 14 components (C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446 plus the _dependencies_table.md and component contract documents. State file updated to greenfield Step 7 (Implement), not_started. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 23:21:13 +00:00 · 2026-05-11 00:39:48 +03:00
parent 8171fcb29e
commit 880eabcb3f
172 changed files with 22897 additions and 35 deletions
@@ -0,0 +1,217 @@
+# C2 NetVLAD Mandatory Simple-Baseline
+
+**Task**: AZ-338_c2_net_vlad
+**Name**: C2 NetVLAD Mandatory Simple-Baseline
+**Description**: Implement `NetVladStrategy`, the C2 mandatory simple-baseline `VprStrategy` (engine rule: every component MUST ship a comparative baseline alongside its production-default; description.md § 1 designates NetVLAD as the C2 baseline). NetVLAD has a much higher embedding dim than UltraVPR (D=4096 with NetVLAD-VGG16 default; can be reduced to D=512 via PCA-whitening per the upstream NetVLAD code drop) and uses PyTorch FP16 (NOT TensorRT) per the simple-baseline policy: "the baseline runs on the simplest available runtime" so a TRT engine compile bug doesn't simultaneously break baseline AND primary. Includes the concrete `NetVladBackbonePreprocessor` (different resize target + normalisation than UltraVPR). MUST satisfy AC-2.1b's relaxed engine-rule floor `recall@10 ≥ 0.85` on Derkachi normal segment.
+**Complexity**: 3 points
+**Dependencies**: AZ-336_c2_vpr_strategy_protocol, AZ-263_initial_structure, AZ-269_config_loader, AZ-300_c7_pytorch_baseline, AZ-303_c6_storage_interfaces, AZ-283_descriptor_normaliser, AZ-266_log_module, AZ-272_fdr_record_schema
+**Component**: c2_vpr (epic AZ-255 / E-C2)
+**Tracker**: AZ-338
+**Epic**: AZ-255 (E-C2)
+
+### Document Dependencies
+
+- `_docs/02_document/contracts/c2_vpr/vpr_strategy_protocol.md` — Protocol contract; every invariant MUST be satisfied; INV-3 (L2-normalised) is critical because NetVLAD raw embeddings include intra-cluster residuals that must be globally L2-normalised after the VLAD aggregation.
+- `_docs/02_document/components/02_c2_vpr/description.md` — § 1 NetVLAD designated as mandatory simple-baseline; § 5 PyTorch matches simple-baseline track; § 9 logging.
+- `_docs/02_document/module-layout.md` — `c2_vpr.net_vlad` Internal entry; `BUILD_VPR_NETVLAD` row; `BUILD_PYTORCH_RUNTIME` row (NetVLAD requires PyTorch runtime ON which is OFF for airborne — NetVLAD is research/replay-only by build-flag combination).
+- `_docs/02_document/components/02_c2_vpr/tests.md` — C2-IT-01 engine rule check `recall@10 ≥ 0.85` for NetVLAD on Derkachi normal segment.
+- `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md` — `InferenceRuntime` interface; AZ-300 `pytorch_fp16_runtime` is the consumed concrete runtime.
+- `_docs/02_document/contracts/shared_helpers/descriptor_normaliser.md` — L2 + intra-normalisation (NetVLAD's published preprocessing chain includes intra-cluster normalisation BEFORE the global L2 normalisation; the `DescriptorNormaliser` helper must support both).
+
+## Problem
+
+Without this task:
+
+- The C2 component has no comparative baseline; the engine rule (every primary backbone has a baseline alongside it for FT-12 comparative-study and for risk reduction if the primary fails) is violated for C2 specifically — the project-wide policy goes unsatisfied for one of its largest backbone surfaces.
+- AC-2.1b's relaxed-floor check (`recall@10 ≥ 0.85` for NetVLAD) has no producer; suite-level FT-P-19 cannot validate the engine rule.
+- The research binary (which links every backbone for IT-12 comparative studies) cannot ship without a NetVLAD strategy; researchers cannot run the comparative study that informs whether the primary's engine choice is justified.
+- A code drop / weights / engine compile bug in UltraVPR has no fallback at the strategy layer; the operator who notices a sudden drop in suite-level satellite re-loc accuracy would have no mechanism to A/B against the baseline.
+
+## Outcome
+
+- `src/gps_denied_onboard/components/c2_vpr/net_vlad.py` defining:
+  - `NetVladStrategy` class implementing the `VprStrategy` Protocol.
+  - Constructor signature: `__init__(self, runtime: InferenceRuntime, tile_store: TileStore, weights_path: Path, preprocessor: NetVladBackbonePreprocessor, normaliser: DescriptorNormaliser, fdr_client: FdrClient, descriptor_dim: int = 4096)`.
+  - `embed_query(frame, calibration)`:
+    1. `tensor = self._preprocessor.preprocess(frame, calibration)` (returns FP16 NCHW (1, 3, H, W); H=W=480 per the upstream NetVLAD-VGG16 default).
+    2. `intermediate = self._runtime.forward(self._engine_id, {"input": tensor})["vlad_descriptor"]` (returns FP16 (1, descriptor_dim) post-VLAD aggregation).
+    3. `intra_normalised = self._normaliser.intra_cluster_normalise(intermediate[0], num_clusters=64)` (per NetVLAD's published preprocessing: intra-cluster L2 first).
+    4. `embedding = self._normaliser.l2_normalise(intra_normalised)` (then global L2).
+    5. Return `VprQuery(frame_id, embedding, produced_at=monotonic_ns())`.
+    6. Catch RuntimeError → wrap in `VprBackboneError`; emit ERROR log + FDR record.
+  - `retrieve_topk(query, k)`: identical to UltraVPR — delegates to `tile_store.faiss_topk`; returns `VprResult` with `backbone_label="net_vlad"`.
+  - `descriptor_dim() -> int`: returns the constructor-passed value (default 4096); asserted at engine-load time against the engine's output tensor shape; mismatch → `RuntimeError`.
+  - Module-level `create(config, tile_store, inference_runtime) -> VprStrategy`:
+    1. Resolve `weights_path = config.vpr.backbone_weights_path` (a PyTorch state_dict file with the `.pth` extension; NetVLAD does NOT use the AZ-281 self-describing TRT filename schema — its own AZ-280 sidecar carries the PCA matrix + cluster centres).
+    2. Resolve `descriptor_dim = config.vpr.netvlad_descriptor_dim` (default 4096; can be 512 if PCA-whitened weights are loaded).
+    3. Construct `NetVladBackbonePreprocessor(input_shape=(480, 480), mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))`.
+    4. Construct `DescriptorNormaliser` with `intra_cluster_normalise` capability.
+    5. Load model via `inference_runtime.load_engine(weights_path)` (the PyTorch runtime accepts `.pth` files; AZ-300).
+    6. Assert engine output shape == `(1, descriptor_dim)`; mismatch → `ConfigurationError`.
+    7. Construct and return `NetVladStrategy(...)`.
+- `src/gps_denied_onboard/components/c2_vpr/_preprocessor_net_vlad.py`:
+  - Implements `BackbonePreprocessor` Protocol.
+  - `preprocess(frame, calibration)`:
+    1. Decode `frame.image_bytes` to RGB uint8 (H_in, W_in, 3).
+    2. Centre-crop to a square region (same calibration-aware logic as UltraVPR — copied here, NOT shared, because the calibration handling is part of the preprocessor's contract).
+    3. Resize to `(480, 480)` via OpenCV.
+    4. Normalise: `(pixel/255.0 - mean) / std`; cast to FP16.
+    5. Transpose HWC → CHW; add batch dim.
+    6. Return ndarray of shape `(1, 3, 480, 480)` dtype float16.
+  - `input_shape() -> tuple[int, ...]`: returns `(480, 480)`.
+  - On failure: raise `VprPreprocessError`.
+- Composition-root wiring path for `config.vpr.strategy == "net_vlad"`.
+- Logging per description.md § 9: INFO `kind="c2.vpr.ready"` with `{strategy: "net_vlad", descriptor_dim: 4096}`; ERROR / WARN identical to UltraVPR.
+- FDR records emitted: `kind="vpr.embed_query"`, `kind="vpr.backbone_error"`, `kind="vpr.preprocess_error"`.
+
+## Scope
+
+### Included
+
+- `NetVladStrategy` implementing the Protocol; `NetVladBackbonePreprocessor` implementing `BackbonePreprocessor`.
+- Module-level `create(config, tile_store, inference_runtime)` factory entry-point.
+- Intra-cluster L2 normalisation BEFORE global L2 normalisation (NetVLAD's published preprocessing chain).
+- Composition-root wiring for `config.vpr.strategy == "net_vlad"`.
+- Engine output shape assertion at load time.
+- Logging + FDR records identical to UltraVPR (the per-backbone label distinguishes the records).
+- Unit tests covering all 7 invariants, the dual-stage normalisation, the preprocessing contract, the load-time shape assertion.
+- `BUILD_VPR_NETVLAD` CMake flag wiring per ADR-002 (ON for research; OFF for airborne / operator-tooling because PyTorch runtime is excluded; ON-but-effectively-unused for replay-cli unless explicitly selected).
+
+### Excluded
+
+- The `VprStrategy` Protocol — owned by AZ-336.
+- The `DescriptorNormaliser.l2_normalise` — already AZ-283. **Note**: AZ-283 ships `l2_normalise`; this task may need to extend AZ-283 to add `intra_cluster_normalise(vec, num_clusters)`. **Decision**: extending AZ-283 is in scope here as a small contract addition (the helper ships with `l2_normalise`; adding `intra_cluster_normalise` is a single function). If AZ-283 is already merged when this task starts, the addition is a backward-compatible function add; no breaking change.
+- The C7 PyTorch runtime — owned by AZ-300; this task consumes the interface.
+- Other backbones — owned by AZ-337 (UltraVPR), AZ-339 (MegaLoc + MixVPR), AZ-340 (SelaVPR + EigenPlaces + SALAD).
+- FAISS retrieve wiring — owned by AZ-341.
+- C2-IT-01's NetVLAD recall@10 ≥ 0.85 acceptance test — deferred to Step 9 / E-BBT.
+
+## Acceptance Criteria
+
+**AC-1: Protocol conformance**
+Given a constructed `NetVladStrategy` instance
+When `isinstance(strategy, VprStrategy)` is evaluated
+Then the result is `True`
+
+**AC-2: `embed_query` produces L2-normalised FP16 (descriptor_dim,) embedding**
+Given a valid `NavCameraFrame` and `CameraCalibration`
+When `strategy.embed_query(frame, calibration)` is called
+Then `embedding.shape == (4096,)` (or the configured `descriptor_dim`), `embedding.dtype == np.float16`, `||embedding||_2 == 1.0 ± 1e-3`
+
+**AC-3: Dual-stage normalisation — intra-cluster THEN global L2**
+Given a fake intermediate VLAD descriptor with non-zero per-cluster sub-vectors
+When the embedding pipeline runs
+Then `intra_cluster_normalise` is called BEFORE `l2_normalise` (verifiable via spy on the normaliser); the order is NEVER reversed; the output's per-cluster sub-vectors are unit-norm in the intra-cluster sense AND the full vector is unit-norm globally
+
+**AC-4: `embed_query` is deterministic**
+Given the same frame + calibration
+When `embed_query` is called 3 times
+Then all three returns have bit-exact `embedding` arrays (ULP-tolerant FP16)
+
+**AC-5: `retrieve_topk` returns exactly k candidates with `backbone_label = "net_vlad"`**
+Given a corpus of 100 tiles + a constructed `VprQuery` with D=4096
+When `strategy.retrieve_topk(query, k=10)` is called
+Then `len(candidates) == 10`; sorted ascending; `backbone_label == "net_vlad"`; `candidates[0].descriptor_dim == 4096`
+
+**AC-6: `descriptor_dim()` is config-driven and stable**
+Given construction with `descriptor_dim=4096`
+When `descriptor_dim()` is called 100 times
+Then every call returns 4096; constructing a second instance with `descriptor_dim=512` (PCA-whitened weights case) returns 512 from that instance's `descriptor_dim()`
+
+**AC-7: Engine output shape mismatch at load → `ConfigurationError`**
+Given a model whose output tensor shape is `(1, 2048)` while `config.vpr.netvlad_descriptor_dim = 4096`
+When `NetVladStrategy.create(...)` is called
+Then `ConfigurationError` is raised with message containing `"engine output shape mismatch: expected (1, 4096), got (1, 2048)"`; the strategy is NOT instantiated
+
+**AC-8: `VprBackboneError` on forward-pass failure**
+Given a `InferenceRuntime` test double that raises `RuntimeError` from `forward`
+When `embed_query` is called
+Then `VprBackboneError` is raised; ERROR log + FDR record emitted
+
+**AC-9: `VprPreprocessError` on corrupt image bytes**
+Given a frame with malformed `image_bytes`
+When `embed_query` is called
+Then `VprPreprocessError` is raised; ERROR log + FDR record emitted
+
+**AC-10: Composition-root wiring**
+Given `config.vpr.strategy = "net_vlad"` AND valid weights AND matching `descriptor_dim`
+When `compose_root(config)` runs
+Then a `NetVladStrategy` is wired; AZ-336 factory's pre-flight `descriptor_dim` validation passes; INFO log `kind="c2.vpr.ready"` with `{strategy: "net_vlad", descriptor_dim: 4096}` emitted
+
+**AC-11: Build-flag combination — NetVLAD requires PyTorch runtime**
+Given `config.vpr.strategy = "net_vlad"` AND `BUILD_PYTORCH_RUNTIME=OFF` (airborne binary)
+When the binary tries to load
+Then `ConfigurationError` is raised at composition-root time with message containing `"NetVLAD requires BUILD_PYTORCH_RUNTIME=ON; this binary has BUILD_PYTORCH_RUNTIME=OFF"`; the binary refuses to start (fail-fast)
+
+## Non-Functional Requirements
+
+**Performance**
+- `embed_query` p95 ≤ 80 ms on Tier-1 Jetson Orin with PyTorch FP16 — looser than UltraVPR's 60 ms because the simple-baseline runs on the simpler runtime; not on the production critical path.
+- `retrieve_topk` p95 ≤ 4 ms — slightly looser than UltraVPR because the higher embedding dim (4096 vs 512) makes FAISS lookup ~ 8× more compute; still sub-frame at 3 Hz.
+- GPU memory: ≤ 800 MB resident for backbone weights — looser than UltraVPR's 600 MB because NetVLAD's VGG16 backbone is larger.
+- These NFRs are not enforced as engine-rule blockers; they're operator guidance for the research binary's resource budget.
+
+**Compatibility**
+- The PyTorch state_dict format is owned by C7's PyTorch runtime (AZ-300); this task consumes the produced model via `config.vpr.backbone_weights_path`.
+- The upstream NetVLAD code drop is pinned per Plan-phase; PCA-whitening parameters change with weights → AZ-280 sidecar carries them.
+
+**Reliability**
+- Strategy is single-threaded by contract (INV-1).
+- Dual-stage normalisation order (intra-cluster THEN global L2) is mandatory; reversing the order produces a different embedding subspace and silently breaks AC-2.1b (recall regression).
+- `VprBackboneError` does not crash the process; downstream falls back to VIO-only.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|-------------|-----------------|
+| AC-1 | `isinstance(NetVladStrategy(...), VprStrategy)` | `True` |
+| AC-2 | `embed_query` output | shape (4096,), dtype float16, L2-norm == 1.0 ± 1e-3 |
+| AC-3 | Spy on normaliser methods | `intra_cluster_normalise` called BEFORE `l2_normalise` exactly once each per `embed_query` |
+| AC-4 | `embed_query` × 3 same frame | bit-exact embeddings |
+| AC-5 | `retrieve_topk` against fixture corpus | `len == 10`, sorted, `backbone_label == "net_vlad"`, `descriptor_dim == 4096` |
+| AC-6 | `descriptor_dim()` × 100 (D=4096 instance) + a second D=512 instance | first instance always 4096; second always 512 |
+| AC-7 | Model with wrong output shape | `ConfigurationError` at create time |
+| AC-8 | `forward` raises | `VprBackboneError`; ERROR log + FDR |
+| AC-9 | malformed `image_bytes` | `VprPreprocessError`; ERROR log + FDR |
+| AC-10 | `compose_root(config="net_vlad")` | wired; INFO log with `{strategy: "net_vlad", descriptor_dim: 4096}` |
+| AC-11 | airborne binary + `config.vpr.strategy = "net_vlad"` | `ConfigurationError` with PyTorch-OFF message; fail-fast |
+| Preprocess-shape | `preprocessor.preprocess(frame)` output | shape `(1, 3, 480, 480)`, dtype float16 |
+| Preprocess-input-shape | `preprocessor.input_shape()` | returns `(480, 480)` |
+
+## Constraints
+
+- **Dual-stage normalisation order is non-negotiable** — intra-cluster THEN global L2. Reversing is forbidden.
+- **NetVLAD uses the PyTorch runtime, NOT TensorRT** — the simple-baseline policy isolates it from TRT engine compile risk. The research binary links both runtimes; airborne binary excludes the PyTorch runtime via `BUILD_PYTORCH_RUNTIME=OFF`, which makes NetVLAD effectively unselectable for airborne (AC-11).
+- **Preprocessing parameters are weights-coupled** — `(480, 480)` resize, ImageNet mean/std. Hard-coded; not config-knobs.
+- **`descriptor_dim` IS config-driven** (unlike UltraVPR which hard-codes 512) because NetVLAD ships in two flavours: full 4096-d and PCA-whitened 512-d. The choice is part of the operator's deployment, not a runtime decision.
+- **Constructor injection only**; no `import gps_denied_onboard.config` inside the strategy module.
+- **The strategy holds the engine ID, NOT the engine itself** — engine lifecycle is owned by C7.
+
+## Risks & Mitigation
+
+**Risk 1: NetVLAD embedding dim of 4096 is 8× larger than UltraVPR's 512; FAISS HNSW lookup is slower**
+- *Risk*: `retrieve_topk` may exceed C2-PT-01's 2 ms budget for the lookup stage; the budget was set against UltraVPR's D=512.
+- *Mitigation*: `retrieve_topk` p95 ≤ 4 ms is the looser baseline budget (acknowledged in NFRs); for the research binary this is acceptable since NetVLAD is comparison-only. If an operator wants the production-fast path with NetVLAD, they configure PCA-whitening (D=512) at corpus build time (C10).
+
+**Risk 2: NetVLAD recall@10 ≥ 0.85 floor not achievable with FP16**
+- *Risk*: FP16 quantisation degrades the VLAD aggregation precision below the relaxed engine-rule floor.
+- *Mitigation*: C2-IT-01's NetVLAD assertion is the validation gate (deferred to Step 9). If FP16 fails, the operator can configure FP32 weights — the strategy does not hard-code dtype; it follows the runtime's loaded model.
+
+**Risk 3: PyTorch FP16 runtime on Tier-1 Jetson is slower than expected**
+- *Risk*: PyTorch FP16 inference on Jetson has known pipeline-stall issues compared to TRT.
+- *Mitigation*: NetVLAD is research-only by build-flag combination (AC-11 enforces); the production critical path is UltraVPR. If a future cycle wants NetVLAD on the airborne binary, that's a separate task: convert NetVLAD to ONNX → TRT engine, then update this strategy to use the TRT runtime.
+
+**Risk 4: Operator picks NetVLAD on airborne binary by mistake**
+- *Risk*: A typo in the airborne config that selects `net_vlad` would silently fall back to VIO-only every flight if the runtime were missing.
+- *Mitigation*: AC-11 makes this fail-fast at composition-root time with a clear error message. Operators learn at startup, not after takeoff.
+
+**Risk 5: AZ-283 `DescriptorNormaliser` may not yet ship `intra_cluster_normalise`**
+- *Risk*: The helper as defined in AZ-283 ships only `l2_normalise`; this task needs `intra_cluster_normalise` too.
+- *Mitigation*: As noted in Scope/Excluded, extending AZ-283 to add `intra_cluster_normalise` is a backward-compatible function addition. If AZ-283 already merged before this task starts, the addition is committed alongside this task with a one-line note in `_docs/02_document/contracts/shared_helpers/descriptor_normaliser.md`. If AZ-283 not yet merged, coordinate the addition during AZ-283's implementation. Either way, no breaking change to existing consumers.
+
+## Runtime Completeness
+
+- **Named capability**: mandatory simple-baseline `VprStrategy` for engine-rule comparative validation against the production-default UltraVPR (architecture / E-C2 / `solution.md` "NetVLAD mandatory simple-baseline" / engine rule + AC-2.1b relaxed floor).
+- **Production code that must exist**: real `NetVladStrategy` calling real C7 PyTorch `InferenceRuntime.forward` with a real loaded NetVLAD `.pth` model; real `NetVladBackbonePreprocessor` performing real OpenCV resize + ImageNet normalisation + FP16 cast; real dual-stage normalisation (intra-cluster THEN global L2); real composition-root wiring path.
+- **Allowed external stubs**: tests MAY use `FakeInferenceRuntime` returning pre-computed VLAD descriptors; `FakeTileStore`; `FakeFdrClient`; `FakeDescriptorNormaliser` instrumented to verify call order (AC-3); production wiring uses the real C7 PyTorch runtime + real NetVLAD weights + real C6.
+- **Unacceptable substitutes**: a NumPy-only NetVLAD forward pass (would not satisfy NFR-perf budget; would defeat the runtime-isolation strategy of using a different runtime than UltraVPR); skipping intra-cluster normalisation (would silently break AC-2.1b's recall floor); using TensorRT for NetVLAD (would defeat the simple-baseline policy of isolating runtime risk); making preprocessing parameters config-knobs (would let operators silently break the recall floor); selecting NetVLAD in an airborne binary (must fail-fast per AC-11); a single-stage L2-only normalisation (would deviate from NetVLAD's published preprocessing chain; recall regression risk).