Files
gps-denied-onboard/_docs/03_ip_attribution/netvlad.md
T
Oleksandr Bezdieniezhnykh ba70381346
ci/woodpecker/push/02-build-push Pipeline failed
Update NetVLAD checkpoint paths and enhance .gitignore
- Changed paths in documentation and configuration files to reflect the new naming convention for the NetVLAD model, transitioning from `models/netvlad/netvlad.pt` to `models/net_vlad/net_vlad.pt`.
- Updated the `.gitignore` to include additional file types and directories related to input data and locally-generated evidence frames.
- Removed the old NetVLAD checkpoint file as part of the transition to the new naming scheme.

These changes ensure consistency across the project and improve the management of generated files.
2026-05-31 19:27:32 +03:00

4.6 KiB

NetVLAD-VGG16 Checkpoint — Provenance & License

Artifact: models/net_vlad/net_vlad.pt Note: File stem MUST equal c2_vpr.net_vlad.MODEL_NAME == "net_vlad" — the PyTorch FP16 runtime uses path.stem as the architecture-registry lookup key. Generated: 2026-05-29 (AZ-965) Architecture: project-owned _NetVladVgg16 in src/gps_denied_onboard/components/c2_vpr/_net_vlad_architecture.py Parameters: 149,002,112 (~568.4 MiB fp32) SHA-256: 745c6f29faa4e6754a74189c503189dbab1978d8ff2c65b48c95749b4e48c444

This checkpoint is a pipeline-integration scaffold, not a retrieval-quality artifact. The encoder weights come from a real public source (torchvision IMAGENET1K_V1), but the NetVLAD pool and PCA tail are deterministic-random — they have NOT been trained for visual place recognition. The orchestrator will run end-to-end with these weights, but retrieval results will be effectively random.

Composition

Layer Source License Trained-for-VPR?
encoder.0encoder.28 (26 keys, VGG16 features [:-2]) torchvision.models.vgg16(weights="IMAGENET1K_V1") BSD-3-Clause No (ImageNet classification)
pool.conv.weight (64, 512, 1, 1) torch.manual_seed(0) → arch-default init Project-owned No
pool.conv.bias (64,) Same Project-owned No
pool.centroids (64, 512) Same Project-owned No
pca.weight (4096, 32768) Same Project-owned No
pca.bias (4096,) Same Project-owned No

Total: 31 state_dict keys; loads strictly into make_net_vlad_vgg16(num_clusters=64, encoder_dim=512, descriptor_dim=4096).

Encoder licence (BSD-3-Clause)

torchvision.models.vgg16 weights are distributed by PyTorch under the BSD-3-Clause licence:

Copyright (c) 2016-, PyTorch Contributors.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: …

Full text: https://github.com/pytorch/vision/blob/main/LICENSE (torchvision project). The model weights themselves are derived from the ImageNet dataset; commercial use of ImageNet-derived models is subject to the ImageNet terms of access (https://www.image-net.org/download.php).

How to reproduce

# From repo root, in the project virtualenv:
source .venv/bin/activate

# torchvision IMAGENET1K_V1 weights download requires HTTPS cert
# validation. On macOS with Python.org installer the system trust
# store is not used by default; export certifi's bundle:
export SSL_CERT_FILE=$(python -c "import certifi; print(certifi.where())")

# Generate the checkpoint:
python scripts/mk_netvlad_checkpoint.py
# → writes models/net_vlad/net_vlad.pt

The script is deterministic (torch.manual_seed(0) before the random-init layers, IMAGENET1K_V1 weights are content-addressed). Re-running on a different machine yields the same SHA-256.

Why this isn't a real-retrieval checkpoint

AZ-965 was scoped at 3 SP to unblock the AZ-840 orchestrator's empty-c10_provisioning.backbones skip-gate. A real-retrieval checkpoint requires one of:

  1. Translate Nanne's Pittsburgh-30k weights (https://github.com/Nanne/pytorch-NetVlad). Nanne's vladv2=False default sets pool.conv.bias=False (no bias key in their state_dict); the project's architecture has bias=True. WPCA is also stored separately as nn.Conv2d(4096, 32768, 1, 1) and would need a reshape→nn.Linear conversion. Estimated 5-8 SP for the translation script plus follow-up Tier-2 verification.
  2. Train from scratch on aerial-imagery datasets (e.g. xView, BigEarthNet, NWPU-RESISC45). Multi-week effort with GPU compute budget.
  3. Use an internal team checkpoint if one exists.

This is filed as the AZ-965 follow-up (see the AZ-965 spec for ticket reference).

Observable behaviour with this checkpoint

With this scaffold checkpoint and the Derkachi clip:

  • c10_provisioning.compile_engines_for_corpus succeeds (PyTorch FP16 runtime is a no-op compile_engine that just sha-256's the .pt and records the path).
  • c2_vpr.NetVladStrategy.create() succeeds (encoder/pool/pca all load, output shape (1, 4096) matches descriptor_dim).
  • embed_query produces valid (1, 4096) fp16 vectors per frame.
  • retrieve_topk produces top-K matches — but they are effectively random, because the NetVLAD pool + PCA never learned a semantic embedding space.
  • Downstream ESKF measurement updates fed from random tile matches will likely diverge — surfacing as a SEPARATE failure mode that's NOT the empty-backbones gate AZ-965 closed.

That ESKF divergence under garbage retrievals is the EXPECTED next gate for the orchestrator chain, and is a separate ticket from AZ-965.