Files
gps-denied-onboard/_docs/03_ip_attribution/netvlad.md
T
Oleksandr Bezdieniezhnykh 97f5f9793c [AZ-965] NetVLAD-VGG16 backbone checkpoint + YAML/compose wiring
AZ-965 ships the NetVLAD .pt checkpoint that clears the AZ-839
empty-c10_provisioning.backbones SKIP gate. Pipeline-integration
scaffold — encoder is real, NetVLAD tail is honestly labelled as
untrained.

Composition:

* Encoder (26 keys, encoder.0..encoder.28): torchvision
  vgg16(weights=IMAGENET1K_V1) features [:-2], BSD-3-Clause.
  Real ImageNet-pretrained VGG16 conv stack.
* NetVLAD pool + PCA tail (5 keys: pool.conv.{weight,bias},
  pool.centroids, pca.{weight,bias}): random-init via
  torch.manual_seed(0). NOT trained for visual place recognition.

Total: 149,002,112 params (568.4 MiB fp32, sha256=745c6f29...).
Round-trip verified locally: torch.load(weights_only=True) +
load_state_dict(strict=True) succeed; forward(1,3,480,480) emits
{'vlad_descriptor': (1, 4096) fp32} — matches NetVladStrategy
contract per net_vlad.py:247-251.

Two material discoveries documented in the AZ-965 spec:

1. The NetVLAD-VGG16 architecture already lives in repo at
   src/gps_denied_onboard/components/c2_vpr/_net_vlad_architecture.py
   — we instantiate it and save a state_dict, NOT externally source.
2. The PyTorch FP16 runtime expects a .pt state_dict (NOT .onnx).
   BackboneConfig.onnx_path is a misnomer for NetVLAD: per AZ-321
   design + c2_vpr description.md §1, NetVLAD runs on PyTorch FP16
   (NOT TRT). compile_engine is a no-op sha256+path wrap;
   deserialize_engine does torch.load(weights_only=True) +
   load_state_dict(strict=True).

User skipped Option A/B/C/D/E question — judgment call = Option B
(IMAGENET1K_V1 + random tail) per "use judgment, don't block":
* Option A (Nanne translation) was 5-8 SP, above the 5 SP budget.
* Option B is 3 SP, fits the budget, honestly labelled.
* Option C (pure random) was borderline-dishonest per Real Results.

Files:

* scripts/mk_netvlad_checkpoint.py — deterministic generator.
* models/netvlad/netvlad.pt — 568 MiB, via git-lfs (.gitattributes
  extended for models/**/*.pt, *.onnx, *.engine).
* configs/operator_replay.yaml — c2_vpr + c10_provisioning blocks
  populated; the field literally named onnx_path actually points
  at the .pt for NetVLAD per the runtime semantics noted above.
* docker-compose.test.jetson.yml — ./models:/opt/models:ro bind
  mount added to e2e-runner.
* _docs/03_ip_attribution/netvlad.md — provenance, licence, how-to-
  reproduce, honest scope statement ("NOT a real-retrieval
  checkpoint; ESKF divergence under garbage retrievals is the
  expected next gate").
* _docs/02_tasks/todo/AZ-965_netvlad_onnx_backbone_provisioning.md
  — rewritten to reflect the .pt-not-.onnx + Option B discoveries.

Tier-2 verification follows in a separate commit after the harness
run confirms the empty-backbones SKIP gate clears.

Out of scope (filed as follow-ups):

* Real-retrieval NetVLAD weights (Nanne Pittsburgh-30k translation
  or internal team checkpoint) — separate ticket.
* AZ-840 orchestrator PASSing end-to-end (depends on retrieval
  quality + ESKF stability).
* AZ-963 60s smoke ESKF divergence (independent chain).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-29 18:03:32 +03:00

4.5 KiB

NetVLAD-VGG16 Checkpoint — Provenance & License

Artifact: models/netvlad/netvlad.pt Generated: 2026-05-29 (AZ-965) Architecture: project-owned _NetVladVgg16 in src/gps_denied_onboard/components/c2_vpr/_net_vlad_architecture.py Parameters: 149,002,112 (~568.4 MiB fp32) SHA-256: 745c6f29faa4e6754a74189c503189dbab1978d8ff2c65b48c95749b4e48c444

This checkpoint is a pipeline-integration scaffold, not a retrieval-quality artifact. The encoder weights come from a real public source (torchvision IMAGENET1K_V1), but the NetVLAD pool and PCA tail are deterministic-random — they have NOT been trained for visual place recognition. The orchestrator will run end-to-end with these weights, but retrieval results will be effectively random.

Composition

Layer Source License Trained-for-VPR?
encoder.0encoder.28 (26 keys, VGG16 features [:-2]) torchvision.models.vgg16(weights="IMAGENET1K_V1") BSD-3-Clause No (ImageNet classification)
pool.conv.weight (64, 512, 1, 1) torch.manual_seed(0) → arch-default init Project-owned No
pool.conv.bias (64,) Same Project-owned No
pool.centroids (64, 512) Same Project-owned No
pca.weight (4096, 32768) Same Project-owned No
pca.bias (4096,) Same Project-owned No

Total: 31 state_dict keys; loads strictly into make_net_vlad_vgg16(num_clusters=64, encoder_dim=512, descriptor_dim=4096).

Encoder licence (BSD-3-Clause)

torchvision.models.vgg16 weights are distributed by PyTorch under the BSD-3-Clause licence:

Copyright (c) 2016-, PyTorch Contributors.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: …

Full text: https://github.com/pytorch/vision/blob/main/LICENSE (torchvision project). The model weights themselves are derived from the ImageNet dataset; commercial use of ImageNet-derived models is subject to the ImageNet terms of access (https://www.image-net.org/download.php).

How to reproduce

# From repo root, in the project virtualenv:
source .venv/bin/activate

# torchvision IMAGENET1K_V1 weights download requires HTTPS cert
# validation. On macOS with Python.org installer the system trust
# store is not used by default; export certifi's bundle:
export SSL_CERT_FILE=$(python -c "import certifi; print(certifi.where())")

# Generate the checkpoint:
python scripts/mk_netvlad_checkpoint.py
# → writes models/netvlad/netvlad.pt

The script is deterministic (torch.manual_seed(0) before the random-init layers, IMAGENET1K_V1 weights are content-addressed). Re-running on a different machine yields the same SHA-256.

Why this isn't a real-retrieval checkpoint

AZ-965 was scoped at 3 SP to unblock the AZ-840 orchestrator's empty-c10_provisioning.backbones skip-gate. A real-retrieval checkpoint requires one of:

  1. Translate Nanne's Pittsburgh-30k weights (https://github.com/Nanne/pytorch-NetVlad). Nanne's vladv2=False default sets pool.conv.bias=False (no bias key in their state_dict); the project's architecture has bias=True. WPCA is also stored separately as nn.Conv2d(4096, 32768, 1, 1) and would need a reshape→nn.Linear conversion. Estimated 5-8 SP for the translation script plus follow-up Tier-2 verification.
  2. Train from scratch on aerial-imagery datasets (e.g. xView, BigEarthNet, NWPU-RESISC45). Multi-week effort with GPU compute budget.
  3. Use an internal team checkpoint if one exists.

This is filed as the AZ-965 follow-up (see the AZ-965 spec for ticket reference).

Observable behaviour with this checkpoint

With this scaffold checkpoint and the Derkachi clip:

  • c10_provisioning.compile_engines_for_corpus succeeds (PyTorch FP16 runtime is a no-op compile_engine that just sha-256's the .pt and records the path).
  • c2_vpr.NetVladStrategy.create() succeeds (encoder/pool/pca all load, output shape (1, 4096) matches descriptor_dim).
  • embed_query produces valid (1, 4096) fp16 vectors per frame.
  • retrieve_topk produces top-K matches — but they are effectively random, because the NetVLAD pool + PCA never learned a semantic embedding space.
  • Downstream ESKF measurement updates fed from random tile matches will likely diverge — surfacing as a SEPARATE failure mode that's NOT the empty-backbones gate AZ-965 closed.

That ESKF divergence under garbage retrievals is the EXPECTED next gate for the orchestrator chain, and is a separate ticket from AZ-965.