Files
Oleksandr Bezdieniezhnykh 773d589d34 [AZ-338] Archive task spec + batch 46 report + state bump
- _docs/02_tasks/todo/AZ-338_c2_net_vlad.md
  -> _docs/02_tasks/done/AZ-338_c2_net_vlad.md
- _docs/03_implementation/batch_46_cycle1_report.md (new)
- _docs/_autodev_state.md: last_completed_batch 45 -> 46;
  sub_step.detail "batch 46 complete - selecting batch 47"

AZ-338 transitioned in Jira: In Progress -> In Testing.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 22:31:56 +03:00

9.7 KiB

Batch 46 / Cycle 1 — Implementation Report

Date: 2026-05-13 Tasks: AZ-338 — C2 NetVLAD Mandatory Simple-Baseline (3pt) Total complexity: 3 points Result: PASS_WITH_WARNINGS (per-batch code review) Jira tracker state: AZ-338 transitioned To Do → In Progress → In Testing

Scope

NetVLAD is the C2 comparative baseline mandated by the engine rule (every production-default backbone ships with a simple-baseline alongside; description.md § 1). Per § 5 the baseline runs on the C7 PyTorch FP16 runtime (NOT TensorRT) — runtime-isolation so a TRT engine compile bug does not simultaneously break baseline + primary. This batch lands the first concrete VprStrategy implementation, validating the AZ-341 FaissBridge plumbing and establishing the pattern that AZ-337 / AZ-339 / AZ-340 follow.

Files Changed

Production (new)

  • src/gps_denied_onboard/components/c2_vpr/net_vlad.pyNetVladStrategy class implementing the VprStrategy Protocol. Constructor wires InferenceRuntimeCut + DescriptorIndexCut + NetVladBackbonePreprocessor + DescriptorNormaliser + FaissBridge. Module-level MODEL_NAME + architecture_factory() exposed for the composition root to bind to C7's architecture registry. Module-level create(config, descriptor_index, inference_runtime) factory consumed by build_vpr_strategy.
  • src/gps_denied_onboard/components/c2_vpr/_net_vlad_architecture.py — canonical NetVLAD-VGG16 architecture (Arandjelović et al. 2016): VGG16 feature extractor up to conv5_3 + NetVLAD pooling layer (soft-assign 1x1 conv + cluster centroids + batched residual aggregation) + optional nn.Linear(K*D, descriptor_dim) PCA projection. K=64, D=512 (VGG16 conv5_3 channels), default descriptor_dim=4096. Torch / torchvision imported lazily inside the factory.
  • src/gps_denied_onboard/components/c2_vpr/_preprocessor_net_vlad.pyNetVladBackbonePreprocessor implementing the C2-internal BackbonePreprocessor Protocol. Decode → centre-crop square → resize (480, 480) → ImageNet mean/std → FP16 NCHW.
  • src/gps_denied_onboard/components/c2_vpr/inference_runtime_cut.py — NEW AZ-507 consumer-side cut mirroring the subset of C7 InferenceRuntime that C2 strategies consume (compile_engine, deserialize_engine, infer, release_engine, current_runtime_label). Lets c2_vpr stay AZ-507-clean.

Production (modified)

  • src/gps_denied_onboard/components/c2_vpr/config.py — added netvlad_descriptor_dim: int = 4096 knob + __post_init__ validation.
  • src/gps_denied_onboard/components/c2_vpr/__init__.py — re-exported InferenceRuntimeCut.
  • src/gps_denied_onboard/helpers/descriptor_normaliser.py — added intra_cluster_normalise(descriptor, num_clusters) for NetVLAD's dual-stage normalisation chain. Backward-compatible function addition (v1.0.0 → v1.1.0).
  • src/gps_denied_onboard/fdr_client/records.py — registered three new record kinds: vpr.embed_query, vpr.backbone_error, vpr.preprocess_error.
  • src/gps_denied_onboard/runtime_root/vpr_factory.py — added _register_strategy_architecture helper that binds (MODEL_NAME, architecture_factory(descriptor_dim)) to C7's architecture registry before delegating to the strategy's create() factory. Keeps the c7 import at L4, preserves AZ-507.

Tests

  • tests/unit/c2_vpr/test_net_vlad.py — NEW, 31 tests covering all 11 ACs (with per-AC variants for AC-2, AC-6, AC-8, AC-9, AC-11) + preprocessor contract (4) + constructor validation (3) + FDR record emission (2) + architecture-factory closure (2).
  • tests/unit/test_az283_descriptor_normaliser.py — +8 tests for intra_cluster_normalise: per-cluster unit norm, dtype preservation, zero-cluster handling, non-divisible length rejection, 2-D rejection, zero/bool num_clusters rejection, float64 rejection, no-mutation invariant.
  • tests/unit/test_az272_fdr_record_schema.py — +3 fixture payloads for the three new VPR record kinds so the schema roundtrip test exercises all of them.

Docs

  • _docs/02_document/contracts/shared_helpers/descriptor_normaliser.md — v1.0.0 → v1.1.0; added intra_cluster_normalise row + changelog entry.
  • _docs/03_implementation/reviews/batch_46_review.md — per-batch code review (PASS_WITH_WARNINGS).

Acceptance Criteria Coverage

All 11 ACs of AZ-338 have at least one covering unit test:

AC Description Status
AC-1 Protocol conformance covered
AC-2 L2-norm == 1.0 ± 1e-3 FP16 (D,) covered (4096 + 512 variants)
AC-3 intra_cluster_normalise BEFORE l2_normalise covered (spy + once-each)
AC-4 Deterministic across 3 calls covered
AC-5 retrieve_topk == k, label="net_vlad", sorted covered
AC-6 descriptor_dim() stable covered (4096 + 512 variants)
AC-7 Engine output shape mismatch → ConfigError covered
AC-8 VprBackboneError on forward failure covered (3 variants)
AC-9 VprPreprocessError on corrupt image covered (3 variants)
AC-10 Composition-root wiring + c2.vpr.ready log covered (log + model_name forcing)
AC-11 BUILD_PYTORCH_RUNTIME=OFF → ConfigError fail-fast covered (tensorrt + onnx_trt_ep variants)

Test Results

  • Full unit suite: 1608 passed / 80 environment-skipped / 0 failed in ~81s. Up from 1565 at the close of Batch 45 (+43 new tests).
  • Focused per-component: c2_vpr/test_net_vlad.py 31/31 PASS; c2_vpr/test_faiss_bridge.py 22/22 PASS (no regression); test_az283_descriptor_normaliser.py 23/23 PASS (15 original + 8 new); test_az272_fdr_record_schema.py 3/3 PASS.
  • Lint: ruff check clean on every new + modified file.
  • AZ-507 layering lint: test_ac6_only_compose_root_imports_concrete_strategies PASS.

Architectural Decisions

  1. Architecture-registration moved to composition root. The AZ-338 spec implied c2_vpr.net_vlad.create() registers the NetVLAD nn.Module factory with C7. That violates AZ-507 (no cross-component imports). Resolved by exposing MODEL_NAME + architecture_factory(descriptor_dim) on the strategy module and having runtime_root/vpr_factory.py::_register_strategy_architecture perform the c7 binding before calling the strategy's create() factory. Pattern is generalisable to AZ-337 / AZ-339 / AZ-340 strategies that also use the PyTorch runtime.

  2. C7 API names aligned with v1.0.0 Protocol. The spec uses runtime.forward(engine_id, ...) and inference_runtime.load_engine(weights_path). Live C7 Protocol (AZ-297) is infer(handle, inputs) + compile_engine(model_path, build_config) → entry + deserialize_engine(entry) → handle. Implementation aligns with the v1.0.0 Protocol; spec § Outcome is stale on these names (flagged as F2 in code review).

  3. InferenceRuntimeCut. New AZ-507 consumer-side cut joins DescriptorIndexCut (AZ-341), TileDownloaderCut (AZ-328), TileUploaderCut (AZ-329), FdrFooterReader (AZ-329) — five structural Protocol cuts across the codebase, all named *Cut or *Reader, all runtime_checkable=True. Pattern is stable.

  4. Dual-stage normalisation order. intra_cluster_normalise BEFORE l2_normalise is mandatory per AZ-338 spec § Constraints and per the published Pittsburgh NetVLAD preprocessing chain. Verified by AC-3 spy. Reversing the order would silently break AC-2.1b (recall regression).

  5. PCA-projection inside the architecture. The published Pittsburgh NetVLAD reference ships with a learned Linear(K*D=32768, 4096) PCA-whitening layer. The architecture embeds the layer as nn.Linear(K*D, descriptor_dim); when descriptor_dim == K*D the layer is omitted (raw VLAD). Default descriptor_dim=4096 per the spec. The .pth state dict is expected to carry the PCA weights alongside the rest of the model parameters.

  6. NetVLAD pooling implemented via torch.bmm, not a Python for k in range(K) loop. Single CUDA kernel; asymptotically equivalent (K=64) but dramatically faster on GPU than the reference Python-loop form.

Carried-over Findings

  • F1 from cumulative review 43-45: _iso_ts_from_clock duplicated across 6 modules — c2_vpr/net_vlad.py is the 6th copy. AZ-508 hygiene PBI exists for consolidation.

New Findings (per Batch 46 code review)

  • F2 (Low, Spec-Hygiene): AZ-338 spec § Outcome uses outdated C7 API names + implies an AZ-507-violating architecture-registration location. Recommend a spec-hygiene follow-up that refreshes AZ-337..AZ-340 against the stabilised C7 v1.0.0 + AZ-507 patterns.
  • F3 (Low, Test-Coverage): NFR-perf microbench (embed_query p95 ≤ 80ms on Tier-1 Jetson) deferred — no real PyTorch CUDA host on the dev tier. Schedule under FT-P-19 / C2-IT-01 on Tier-1 hardware.
  • F4 (Low, Architecture): PCA-projection sidecar verification deferred — the architecture loads PCA weights via load_state_dict(strict=True) only; no cross-check against an AZ-280 sidecar manifest yet.

Jira Tracker

  • AZ-338 transitioned: To Do → In Progress → In Testing.
  • Task spec archived: _docs/02_tasks/todo/AZ-338_c2_net_vlad.md_docs/02_tasks/done/AZ-338_c2_net_vlad.md.

Next

Per the autodev orchestrator loop: select Batch 47. C2 production path is now half-built (AZ-336 Protocol + AZ-341 FaissBridge + AZ-338 NetVLAD baseline). The remaining C2 strategies are AZ-337 (UltraVPR primary, 5pt), AZ-339 (MegaLoc + MixVPR, 5pt), AZ-340 (SelaVPR + EigenPlaces + SALAD, 5pt). Other candidates: AZ-358 (C4 OpenCV/GTSAM pose estimator, 5pt), AZ-349 (C3.5 AdHoP refiner, 5pt), AZ-389 (C5 internal orthorectifier, 3pt), AZ-508 (ISO timestamp hygiene, 2pt), or a spec-hygiene PBI to address F2 / F3 from this batch + cumulative F2 / F3.