mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 08:51:12 +00:00
97f5f9793c
AZ-965 ships the NetVLAD .pt checkpoint that clears the AZ-839
empty-c10_provisioning.backbones SKIP gate. Pipeline-integration
scaffold — encoder is real, NetVLAD tail is honestly labelled as
untrained.
Composition:
* Encoder (26 keys, encoder.0..encoder.28): torchvision
vgg16(weights=IMAGENET1K_V1) features [:-2], BSD-3-Clause.
Real ImageNet-pretrained VGG16 conv stack.
* NetVLAD pool + PCA tail (5 keys: pool.conv.{weight,bias},
pool.centroids, pca.{weight,bias}): random-init via
torch.manual_seed(0). NOT trained for visual place recognition.
Total: 149,002,112 params (568.4 MiB fp32, sha256=745c6f29...).
Round-trip verified locally: torch.load(weights_only=True) +
load_state_dict(strict=True) succeed; forward(1,3,480,480) emits
{'vlad_descriptor': (1, 4096) fp32} — matches NetVladStrategy
contract per net_vlad.py:247-251.
Two material discoveries documented in the AZ-965 spec:
1. The NetVLAD-VGG16 architecture already lives in repo at
src/gps_denied_onboard/components/c2_vpr/_net_vlad_architecture.py
— we instantiate it and save a state_dict, NOT externally source.
2. The PyTorch FP16 runtime expects a .pt state_dict (NOT .onnx).
BackboneConfig.onnx_path is a misnomer for NetVLAD: per AZ-321
design + c2_vpr description.md §1, NetVLAD runs on PyTorch FP16
(NOT TRT). compile_engine is a no-op sha256+path wrap;
deserialize_engine does torch.load(weights_only=True) +
load_state_dict(strict=True).
User skipped Option A/B/C/D/E question — judgment call = Option B
(IMAGENET1K_V1 + random tail) per "use judgment, don't block":
* Option A (Nanne translation) was 5-8 SP, above the 5 SP budget.
* Option B is 3 SP, fits the budget, honestly labelled.
* Option C (pure random) was borderline-dishonest per Real Results.
Files:
* scripts/mk_netvlad_checkpoint.py — deterministic generator.
* models/netvlad/netvlad.pt — 568 MiB, via git-lfs (.gitattributes
extended for models/**/*.pt, *.onnx, *.engine).
* configs/operator_replay.yaml — c2_vpr + c10_provisioning blocks
populated; the field literally named onnx_path actually points
at the .pt for NetVLAD per the runtime semantics noted above.
* docker-compose.test.jetson.yml — ./models:/opt/models:ro bind
mount added to e2e-runner.
* _docs/03_ip_attribution/netvlad.md — provenance, licence, how-to-
reproduce, honest scope statement ("NOT a real-retrieval
checkpoint; ESKF divergence under garbage retrievals is the
expected next gate").
* _docs/02_tasks/todo/AZ-965_netvlad_onnx_backbone_provisioning.md
— rewritten to reflect the .pt-not-.onnx + Option B discoveries.
Tier-2 verification follows in a separate commit after the harness
run confirms the empty-backbones SKIP gate clears.
Out of scope (filed as follow-ups):
* Real-retrieval NetVLAD weights (Nanne Pittsburgh-30k translation
or internal team checkpoint) — separate ticket.
* AZ-840 orchestrator PASSing end-to-end (depends on retrieval
quality + ESKF stability).
* AZ-963 60s smoke ESKF divergence (independent chain).
Co-authored-by: Cursor <cursoragent@cursor.com>
82 lines
3.1 KiB
YAML
82 lines
3.1 KiB
YAML
# AZ-962 — Operator pre-flight + replay-mode config for Tier-2 Jetson e2e harness.
|
|
#
|
|
# Consumed by `tests/e2e/replay/conftest.py::_build_operator_pre_flight_cache`
|
|
# (the AZ-839 C3 fixture) which `load_config(env, paths=[this])` then drives the
|
|
# AZ-840 7-step orchestrator (`test_az835_e2e_real_flight.py`).
|
|
#
|
|
# Most fields stay at their dataclass defaults (see
|
|
# `src/gps_denied_onboard/components/{c6_tile_cache,c7_inference,c10_provisioning,c11_tile_manager}/config.py`).
|
|
# The blocks are declared here primarily so the four-component contract the
|
|
# fixture skip-gate cites is satisfied by inspection of this file. The env
|
|
# vars below are filled by docker-compose.test.jetson.yml / `.env.test`:
|
|
#
|
|
# * `GPS_DENIED_FC_PROFILE`, `GPS_DENIED_TIER`, `DB_URL` → runtime
|
|
# * `INFERENCE_BACKEND`, `TILE_CACHE_PATH`, `CAMERA_CALIBRATION_PATH` → runtime
|
|
# * `LOG_LEVEL`, `LOG_SINK` → log
|
|
# * `FDR_PATH` → fdr
|
|
# * `SATELLITE_PROVIDER_URL` → c11_tile_manager.satellite_provider_url
|
|
# * `SATELLITE_PROVIDER_API_KEY` → c11_tile_manager.service_api_key
|
|
#
|
|
# AZ-965 (2026-05-29): `c10_provisioning.backbones` now declares a
|
|
# single NetVLAD-VGG16 entry pointing at `models/netvlad/netvlad.pt`
|
|
# (568 MiB git-lfs blob; see `_docs/03_ip_attribution/netvlad.md` for
|
|
# provenance — VGG16 encoder = torchvision IMAGENET1K_V1 BSD, NetVLAD
|
|
# pool + PCA tail = deterministic-random untrained). Bind-mounted into
|
|
# the e2e-runner at `/opt/models` via docker-compose.test.jetson.yml.
|
|
# AZ-321 design: NetVLAD runs on the PyTorch FP16 runtime (NOT TRT),
|
|
# so the field literally named `onnx_path` here is actually the path
|
|
# to the `.pt` PyTorch state_dict the runtime consumes.
|
|
|
|
__top__:
|
|
mode: replay
|
|
|
|
runtime:
|
|
fc_profile: ardupilot_plane
|
|
tier: 2
|
|
|
|
replay:
|
|
pace: asap
|
|
target_fc_dialect: ardupilot_plane
|
|
|
|
c6_tile_cache:
|
|
store_runtime: postgres_filesystem
|
|
metadata_runtime: postgres_filesystem
|
|
descriptor_index_runtime: faiss_hnsw
|
|
postgres_pool_size: 4
|
|
lru_eviction_threshold_bytes: 10737418240 # 10 GiB
|
|
|
|
c7_inference:
|
|
runtime: pytorch_fp16
|
|
thermal_poll_hz: 1.0
|
|
engine_cache_dir: /var/lib/gps-denied/engines
|
|
gpu_memory_budget_bytes: 4294967296 # 4 GiB
|
|
trtexec_timeout_s: 600
|
|
ort_trt_cache_dir: /var/lib/gps-denied/engines/ort_trt_cache
|
|
|
|
c2_vpr:
|
|
strategy: net_vlad
|
|
backbone_weights_path: /opt/models/netvlad/netvlad.pt
|
|
netvlad_descriptor_dim: 4096
|
|
warn_top1_threshold: 0.30
|
|
# faiss_index_path is overlaid at runtime by
|
|
# tests/e2e/replay/_e2e_orchestrator.py::write_effective_replay_config
|
|
# to point at <cache_root>/descriptor.index (the C3 fixture's tmp).
|
|
|
|
c10_provisioning:
|
|
workspace_mb: 4096
|
|
backbones:
|
|
- model_name: net_vlad
|
|
onnx_path: /opt/models/netvlad/netvlad.pt
|
|
expected_input_shape: [3, 480, 480]
|
|
input_name: input
|
|
|
|
c11_tile_manager:
|
|
# satellite_provider_url + service_api_key flow in from env vars
|
|
# (SATELLITE_PROVIDER_URL / SATELLITE_PROVIDER_API_KEY) via the
|
|
# loader's ENV_KEY_MAP additions in AZ-962.
|
|
upload_batch_size: 25
|
|
upload_http_timeout_s: 30.0
|
|
download_http_timeout_s: 30.0
|
|
download_max_5xx_retries: 4
|
|
download_resolution_floor_m_per_px: 0.5
|