mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 09:51:13 +00:00
ba70381346
ci/woodpecker/push/02-build-push Pipeline failed
- Changed paths in documentation and configuration files to reflect the new naming convention for the NetVLAD model, transitioning from `models/netvlad/netvlad.pt` to `models/net_vlad/net_vlad.pt`. - Updated the `.gitignore` to include additional file types and directories related to input data and locally-generated evidence frames. - Removed the old NetVLAD checkpoint file as part of the transition to the new naming scheme. These changes ensure consistency across the project and improve the management of generated files.
74 lines
4.6 KiB
Markdown
74 lines
4.6 KiB
Markdown
# NetVLAD-VGG16 Checkpoint — Provenance & License
|
|
|
|
**Artifact**: `models/net_vlad/net_vlad.pt`
|
|
**Note**: File stem MUST equal `c2_vpr.net_vlad.MODEL_NAME == "net_vlad"` — the PyTorch FP16 runtime uses `path.stem` as the architecture-registry lookup key.
|
|
**Generated**: 2026-05-29 (AZ-965)
|
|
**Architecture**: project-owned `_NetVladVgg16` in `src/gps_denied_onboard/components/c2_vpr/_net_vlad_architecture.py`
|
|
**Parameters**: 149,002,112 (~568.4 MiB fp32)
|
|
**SHA-256**: `745c6f29faa4e6754a74189c503189dbab1978d8ff2c65b48c95749b4e48c444`
|
|
|
|
This checkpoint is a **pipeline-integration scaffold**, not a retrieval-quality artifact. The encoder weights come from a real public source (torchvision IMAGENET1K_V1), but the NetVLAD pool and PCA tail are deterministic-random — they have NOT been trained for visual place recognition. The orchestrator will run end-to-end with these weights, but retrieval results will be effectively random.
|
|
|
|
## Composition
|
|
|
|
| Layer | Source | License | Trained-for-VPR? |
|
|
|---|---|---|---|
|
|
| `encoder.0` … `encoder.28` (26 keys, VGG16 features `[:-2]`) | `torchvision.models.vgg16(weights="IMAGENET1K_V1")` | BSD-3-Clause | No (ImageNet classification) |
|
|
| `pool.conv.weight` (64, 512, 1, 1) | `torch.manual_seed(0)` → arch-default init | Project-owned | No |
|
|
| `pool.conv.bias` (64,) | Same | Project-owned | No |
|
|
| `pool.centroids` (64, 512) | Same | Project-owned | No |
|
|
| `pca.weight` (4096, 32768) | Same | Project-owned | No |
|
|
| `pca.bias` (4096,) | Same | Project-owned | No |
|
|
|
|
Total: 31 state_dict keys; loads strictly into `make_net_vlad_vgg16(num_clusters=64, encoder_dim=512, descriptor_dim=4096)`.
|
|
|
|
## Encoder licence (BSD-3-Clause)
|
|
|
|
`torchvision.models.vgg16` weights are distributed by PyTorch under the BSD-3-Clause licence:
|
|
|
|
> Copyright (c) 2016-, PyTorch Contributors.
|
|
>
|
|
> Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: …
|
|
|
|
Full text: https://github.com/pytorch/vision/blob/main/LICENSE (torchvision project). The model weights themselves are derived from the ImageNet dataset; commercial use of ImageNet-derived models is subject to the ImageNet terms of access (https://www.image-net.org/download.php).
|
|
|
|
## How to reproduce
|
|
|
|
```bash
|
|
# From repo root, in the project virtualenv:
|
|
source .venv/bin/activate
|
|
|
|
# torchvision IMAGENET1K_V1 weights download requires HTTPS cert
|
|
# validation. On macOS with Python.org installer the system trust
|
|
# store is not used by default; export certifi's bundle:
|
|
export SSL_CERT_FILE=$(python -c "import certifi; print(certifi.where())")
|
|
|
|
# Generate the checkpoint:
|
|
python scripts/mk_netvlad_checkpoint.py
|
|
# → writes models/net_vlad/net_vlad.pt
|
|
```
|
|
|
|
The script is **deterministic** (`torch.manual_seed(0)` before the random-init layers, IMAGENET1K_V1 weights are content-addressed). Re-running on a different machine yields the same SHA-256.
|
|
|
|
## Why this isn't a real-retrieval checkpoint
|
|
|
|
AZ-965 was scoped at 3 SP to unblock the AZ-840 orchestrator's empty-`c10_provisioning.backbones` skip-gate. A real-retrieval checkpoint requires one of:
|
|
|
|
1. **Translate Nanne's Pittsburgh-30k weights** (https://github.com/Nanne/pytorch-NetVlad). Nanne's `vladv2=False` default sets `pool.conv.bias=False` (no bias key in their state_dict); the project's architecture has `bias=True`. WPCA is also stored separately as `nn.Conv2d(4096, 32768, 1, 1)` and would need a reshape→`nn.Linear` conversion. Estimated 5-8 SP for the translation script plus follow-up Tier-2 verification.
|
|
2. **Train from scratch on aerial-imagery datasets** (e.g. xView, BigEarthNet, NWPU-RESISC45). Multi-week effort with GPU compute budget.
|
|
3. **Use an internal team checkpoint** if one exists.
|
|
|
|
This is filed as the AZ-965 follow-up (see the AZ-965 spec for ticket reference).
|
|
|
|
## Observable behaviour with this checkpoint
|
|
|
|
With this scaffold checkpoint and the Derkachi clip:
|
|
|
|
* `c10_provisioning.compile_engines_for_corpus` succeeds (PyTorch FP16 runtime is a no-op `compile_engine` that just sha-256's the `.pt` and records the path).
|
|
* `c2_vpr.NetVladStrategy.create()` succeeds (encoder/pool/pca all load, output shape `(1, 4096)` matches descriptor_dim).
|
|
* `embed_query` produces valid `(1, 4096)` fp16 vectors per frame.
|
|
* `retrieve_topk` produces top-K matches — but they are effectively random, because the NetVLAD pool + PCA never learned a semantic embedding space.
|
|
* Downstream ESKF measurement updates fed from random tile matches will likely diverge — surfacing as a SEPARATE failure mode that's NOT the empty-backbones gate AZ-965 closed.
|
|
|
|
That ESKF divergence under garbage retrievals is the EXPECTED next gate for the orchestrator chain, and is a separate ticket from AZ-965.
|