- _docs/02_tasks/todo/AZ-338_c2_net_vlad.md -> _docs/02_tasks/done/AZ-338_c2_net_vlad.md - _docs/03_implementation/batch_46_cycle1_report.md (new) - _docs/_autodev_state.md: last_completed_batch 45 -> 46; sub_step.detail "batch 46 complete - selecting batch 47" AZ-338 transitioned in Jira: In Progress -> In Testing. Co-authored-by: Cursor <cursoragent@cursor.com>
9.7 KiB
Batch 46 / Cycle 1 — Implementation Report
Date: 2026-05-13 Tasks: AZ-338 — C2 NetVLAD Mandatory Simple-Baseline (3pt) Total complexity: 3 points Result: PASS_WITH_WARNINGS (per-batch code review) Jira tracker state: AZ-338 transitioned To Do → In Progress → In Testing
Scope
NetVLAD is the C2 comparative baseline mandated by the engine rule
(every production-default backbone ships with a simple-baseline
alongside; description.md § 1). Per § 5 the baseline runs on the C7
PyTorch FP16 runtime (NOT TensorRT) — runtime-isolation so a TRT engine
compile bug does not simultaneously break baseline + primary. This
batch lands the first concrete VprStrategy implementation, validating
the AZ-341 FaissBridge plumbing and establishing the pattern that
AZ-337 / AZ-339 / AZ-340 follow.
Files Changed
Production (new)
src/gps_denied_onboard/components/c2_vpr/net_vlad.py—NetVladStrategyclass implementing theVprStrategyProtocol. Constructor wiresInferenceRuntimeCut+DescriptorIndexCut+NetVladBackbonePreprocessor+DescriptorNormaliser+FaissBridge. Module-levelMODEL_NAME+architecture_factory()exposed for the composition root to bind to C7's architecture registry. Module-levelcreate(config, descriptor_index, inference_runtime)factory consumed bybuild_vpr_strategy.src/gps_denied_onboard/components/c2_vpr/_net_vlad_architecture.py— canonical NetVLAD-VGG16 architecture (Arandjelović et al. 2016): VGG16 feature extractor up to conv5_3 + NetVLAD pooling layer (soft-assign 1x1 conv + cluster centroids + batched residual aggregation) + optionalnn.Linear(K*D, descriptor_dim)PCA projection. K=64, D=512 (VGG16 conv5_3 channels), default descriptor_dim=4096. Torch / torchvision imported lazily inside the factory.src/gps_denied_onboard/components/c2_vpr/_preprocessor_net_vlad.py—NetVladBackbonePreprocessorimplementing the C2-internalBackbonePreprocessorProtocol. Decode → centre-crop square → resize (480, 480) → ImageNet mean/std → FP16 NCHW.src/gps_denied_onboard/components/c2_vpr/inference_runtime_cut.py— NEW AZ-507 consumer-side cut mirroring the subset of C7InferenceRuntimethat C2 strategies consume (compile_engine,deserialize_engine,infer,release_engine,current_runtime_label). Letsc2_vprstay AZ-507-clean.
Production (modified)
src/gps_denied_onboard/components/c2_vpr/config.py— addednetvlad_descriptor_dim: int = 4096knob +__post_init__validation.src/gps_denied_onboard/components/c2_vpr/__init__.py— re-exportedInferenceRuntimeCut.src/gps_denied_onboard/helpers/descriptor_normaliser.py— addedintra_cluster_normalise(descriptor, num_clusters)for NetVLAD's dual-stage normalisation chain. Backward-compatible function addition (v1.0.0 → v1.1.0).src/gps_denied_onboard/fdr_client/records.py— registered three new record kinds:vpr.embed_query,vpr.backbone_error,vpr.preprocess_error.src/gps_denied_onboard/runtime_root/vpr_factory.py— added_register_strategy_architecturehelper that binds(MODEL_NAME, architecture_factory(descriptor_dim))to C7's architecture registry before delegating to the strategy'screate()factory. Keeps the c7 import at L4, preserves AZ-507.
Tests
tests/unit/c2_vpr/test_net_vlad.py— NEW, 31 tests covering all 11 ACs (with per-AC variants for AC-2, AC-6, AC-8, AC-9, AC-11) + preprocessor contract (4) + constructor validation (3) + FDR record emission (2) + architecture-factory closure (2).tests/unit/test_az283_descriptor_normaliser.py— +8 tests forintra_cluster_normalise: per-cluster unit norm, dtype preservation, zero-cluster handling, non-divisible length rejection, 2-D rejection, zero/boolnum_clustersrejection, float64 rejection, no-mutation invariant.tests/unit/test_az272_fdr_record_schema.py— +3 fixture payloads for the three new VPR record kinds so the schema roundtrip test exercises all of them.
Docs
_docs/02_document/contracts/shared_helpers/descriptor_normaliser.md— v1.0.0 → v1.1.0; addedintra_cluster_normaliserow + changelog entry._docs/03_implementation/reviews/batch_46_review.md— per-batch code review (PASS_WITH_WARNINGS).
Acceptance Criteria Coverage
All 11 ACs of AZ-338 have at least one covering unit test:
| AC | Description | Status |
|---|---|---|
| AC-1 | Protocol conformance | covered |
| AC-2 | L2-norm == 1.0 ± 1e-3 FP16 (D,) | covered (4096 + 512 variants) |
| AC-3 | intra_cluster_normalise BEFORE l2_normalise |
covered (spy + once-each) |
| AC-4 | Deterministic across 3 calls | covered |
| AC-5 | retrieve_topk == k, label="net_vlad", sorted |
covered |
| AC-6 | descriptor_dim() stable |
covered (4096 + 512 variants) |
| AC-7 | Engine output shape mismatch → ConfigError | covered |
| AC-8 | VprBackboneError on forward failure |
covered (3 variants) |
| AC-9 | VprPreprocessError on corrupt image |
covered (3 variants) |
| AC-10 | Composition-root wiring + c2.vpr.ready log |
covered (log + model_name forcing) |
| AC-11 | BUILD_PYTORCH_RUNTIME=OFF → ConfigError fail-fast |
covered (tensorrt + onnx_trt_ep variants) |
Test Results
- Full unit suite:
1608 passed / 80 environment-skipped / 0 failedin ~81s. Up from 1565 at the close of Batch 45 (+43 new tests). - Focused per-component:
c2_vpr/test_net_vlad.py31/31 PASS;c2_vpr/test_faiss_bridge.py22/22 PASS (no regression);test_az283_descriptor_normaliser.py23/23 PASS (15 original + 8 new);test_az272_fdr_record_schema.py3/3 PASS. - Lint:
ruff checkclean on every new + modified file. - AZ-507 layering lint:
test_ac6_only_compose_root_imports_concrete_strategiesPASS.
Architectural Decisions
-
Architecture-registration moved to composition root. The AZ-338 spec implied
c2_vpr.net_vlad.create()registers the NetVLAD nn.Module factory with C7. That violates AZ-507 (no cross-component imports). Resolved by exposingMODEL_NAME+architecture_factory(descriptor_dim)on the strategy module and havingruntime_root/vpr_factory.py::_register_strategy_architectureperform the c7 binding before calling the strategy'screate()factory. Pattern is generalisable to AZ-337 / AZ-339 / AZ-340 strategies that also use the PyTorch runtime. -
C7 API names aligned with v1.0.0 Protocol. The spec uses
runtime.forward(engine_id, ...)andinference_runtime.load_engine(weights_path). Live C7 Protocol (AZ-297) isinfer(handle, inputs)+compile_engine(model_path, build_config) → entry+deserialize_engine(entry) → handle. Implementation aligns with the v1.0.0 Protocol; spec § Outcome is stale on these names (flagged as F2 in code review). -
InferenceRuntimeCut. New AZ-507 consumer-side cut joins
DescriptorIndexCut(AZ-341),TileDownloaderCut(AZ-328),TileUploaderCut(AZ-329),FdrFooterReader(AZ-329) — five structural Protocol cuts across the codebase, all named*Cutor*Reader, allruntime_checkable=True. Pattern is stable. -
Dual-stage normalisation order.
intra_cluster_normaliseBEFOREl2_normaliseis mandatory per AZ-338 spec § Constraints and per the published Pittsburgh NetVLAD preprocessing chain. Verified by AC-3 spy. Reversing the order would silently break AC-2.1b (recall regression). -
PCA-projection inside the architecture. The published Pittsburgh NetVLAD reference ships with a learned
Linear(K*D=32768, 4096)PCA-whitening layer. The architecture embeds the layer asnn.Linear(K*D, descriptor_dim); whendescriptor_dim == K*Dthe layer is omitted (raw VLAD). Defaultdescriptor_dim=4096per the spec. The.pthstate dict is expected to carry the PCA weights alongside the rest of the model parameters. -
NetVLAD pooling implemented via
torch.bmm, not a Pythonfor k in range(K)loop. Single CUDA kernel; asymptotically equivalent (K=64) but dramatically faster on GPU than the reference Python-loop form.
Carried-over Findings
- F1 from cumulative review 43-45:
_iso_ts_from_clockduplicated across 6 modules —c2_vpr/net_vlad.pyis the 6th copy. AZ-508 hygiene PBI exists for consolidation.
New Findings (per Batch 46 code review)
- F2 (Low, Spec-Hygiene): AZ-338 spec § Outcome uses outdated C7 API names + implies an AZ-507-violating architecture-registration location. Recommend a spec-hygiene follow-up that refreshes AZ-337..AZ-340 against the stabilised C7 v1.0.0 + AZ-507 patterns.
- F3 (Low, Test-Coverage): NFR-perf microbench (
embed_queryp95 ≤ 80ms on Tier-1 Jetson) deferred — no real PyTorch CUDA host on the dev tier. Schedule under FT-P-19 / C2-IT-01 on Tier-1 hardware. - F4 (Low, Architecture): PCA-projection sidecar verification
deferred — the architecture loads PCA weights via
load_state_dict(strict=True)only; no cross-check against an AZ-280 sidecar manifest yet.
Jira Tracker
- AZ-338 transitioned: To Do → In Progress → In Testing.
- Task spec archived:
_docs/02_tasks/todo/AZ-338_c2_net_vlad.md→_docs/02_tasks/done/AZ-338_c2_net_vlad.md.
Next
Per the autodev orchestrator loop: select Batch 47. C2 production path is now half-built (AZ-336 Protocol + AZ-341 FaissBridge + AZ-338 NetVLAD baseline). The remaining C2 strategies are AZ-337 (UltraVPR primary, 5pt), AZ-339 (MegaLoc + MixVPR, 5pt), AZ-340 (SelaVPR + EigenPlaces + SALAD, 5pt). Other candidates: AZ-358 (C4 OpenCV/GTSAM pose estimator, 5pt), AZ-349 (C3.5 AdHoP refiner, 5pt), AZ-389 (C5 internal orthorectifier, 3pt), AZ-508 (ISO timestamp hygiene, 2pt), or a spec-hygiene PBI to address F2 / F3 from this batch + cumulative F2 / F3.