gps-denied-onboard/_docs/03_ip_attribution/netvlad.md

# NetVLAD-VGG16 Checkpoint — Provenance & License

**Artifact**: `models/net_vlad/net_vlad.pt`
**Note**: File stem MUST equal `c2_vpr.net_vlad.MODEL_NAME == "net_vlad"` — the PyTorch FP16 runtime uses `path.stem` as the architecture-registry lookup key.
**Generated**: 2026-05-29 (AZ-965)
**Architecture**: project-owned `_NetVladVgg16` in `src/gps_denied_onboard/components/c2_vpr/_net_vlad_architecture.py`
**Parameters**: 149,002,112 (~568.4 MiB fp32)
**SHA-256**: `745c6f29faa4e6754a74189c503189dbab1978d8ff2c65b48c95749b4e48c444`

This checkpoint is a **pipeline-integration scaffold**, not a retrieval-quality artifact. The encoder weights come from a real public source (torchvision IMAGENET1K_V1), but the NetVLAD pool and PCA tail are deterministic-random — they have NOT been trained for visual place recognition. The orchestrator will run end-to-end with these weights, but retrieval results will be effectively random.

## Composition

| Layer | Source | License | Trained-for-VPR? |
|---|---|---|---|
| `encoder.0` … `encoder.28` (26 keys, VGG16 features `[:-2]`) | `torchvision.models.vgg16(weights="IMAGENET1K_V1")` | BSD-3-Clause | No (ImageNet classification) |
| `pool.conv.weight` (64, 512, 1, 1) | `torch.manual_seed(0)` → arch-default init | Project-owned | No |
| `pool.conv.bias` (64,) | Same | Project-owned | No |
| `pool.centroids` (64, 512) | Same | Project-owned | No |
| `pca.weight` (4096, 32768) | Same | Project-owned | No |
| `pca.bias` (4096,) | Same | Project-owned | No |

Total: 31 state_dict keys; loads strictly into `make_net_vlad_vgg16(num_clusters=64, encoder_dim=512, descriptor_dim=4096)`.

## Encoder licence (BSD-3-Clause)

`torchvision.models.vgg16` weights are distributed by PyTorch under the BSD-3-Clause licence:

> Copyright (c) 2016-, PyTorch Contributors.
>
> Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: …

Full text: https://github.com/pytorch/vision/blob/main/LICENSE (torchvision project). The model weights themselves are derived from the ImageNet dataset; commercial use of ImageNet-derived models is subject to the ImageNet terms of access (https://www.image-net.org/download.php).

## How to reproduce

```bash
# From repo root, in the project virtualenv:
source .venv/bin/activate

# torchvision IMAGENET1K_V1 weights download requires HTTPS cert
# validation. On macOS with Python.org installer the system trust
# store is not used by default; export certifi's bundle:
export SSL_CERT_FILE=$(python -c "import certifi; print(certifi.where())")

# Generate the checkpoint:
python scripts/mk_netvlad_checkpoint.py
# → writes models/net_vlad/net_vlad.pt
```

The script is **deterministic** (`torch.manual_seed(0)` before the random-init layers, IMAGENET1K_V1 weights are content-addressed). Re-running on a different machine yields the same SHA-256.

## Why this isn't a real-retrieval checkpoint

AZ-965 was scoped at 3 SP to unblock the AZ-840 orchestrator's empty-`c10_provisioning.backbones` skip-gate. A real-retrieval checkpoint requires one of:

1. **Translate Nanne's Pittsburgh-30k weights** (https://github.com/Nanne/pytorch-NetVlad). Nanne's `vladv2=False` default sets `pool.conv.bias=False` (no bias key in their state_dict); the project's architecture has `bias=True`. WPCA is also stored separately as `nn.Conv2d(4096, 32768, 1, 1)` and would need a reshape→`nn.Linear` conversion. Estimated 5-8 SP for the translation script plus follow-up Tier-2 verification.
2. **Train from scratch on aerial-imagery datasets** (e.g. xView, BigEarthNet, NWPU-RESISC45). Multi-week effort with GPU compute budget.
3. **Use an internal team checkpoint** if one exists.

This is filed as the AZ-965 follow-up (see the AZ-965 spec for ticket reference).

## Observable behaviour with this checkpoint

With this scaffold checkpoint and the Derkachi clip:

* `c10_provisioning.compile_engines_for_corpus` succeeds (PyTorch FP16 runtime is a no-op `compile_engine` that just sha-256's the `.pt` and records the path).
* `c2_vpr.NetVladStrategy.create()` succeeds (encoder/pool/pca all load, output shape `(1, 4096)` matches descriptor_dim).
* `embed_query` produces valid `(1, 4096)` fp16 vectors per frame.
* `retrieve_topk` produces top-K matches — but they are effectively random, because the NetVLAD pool + PCA never learned a semantic embedding space.
* Downstream ESKF measurement updates fed from random tile matches will likely diverge — surfacing as a SEPARATE failure mode that's NOT the empty-backbones gate AZ-965 closed.

That ESKF divergence under garbage retrievals is the EXPECTED next gate for the orchestrator chain, and is a separate ticket from AZ-965.