gps-denied-onboard

mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-06-21 10:31:13 +00:00

Author	SHA1	Message	Date
Oleksandr Bezdieniezhnykh	97f5f9793c	[AZ-965] NetVLAD-VGG16 backbone checkpoint + YAML/compose wiring AZ-965 ships the NetVLAD .pt checkpoint that clears the AZ-839 empty-c10_provisioning.backbones SKIP gate. Pipeline-integration scaffold — encoder is real, NetVLAD tail is honestly labelled as untrained. Composition: * Encoder (26 keys, encoder.0..encoder.28): torchvision vgg16(weights=IMAGENET1K_V1) features [:-2], BSD-3-Clause. Real ImageNet-pretrained VGG16 conv stack. * NetVLAD pool + PCA tail (5 keys: pool.conv.{weight,bias}, pool.centroids, pca.{weight,bias}): random-init via torch.manual_seed(0). NOT trained for visual place recognition. Total: 149,002,112 params (568.4 MiB fp32, sha256=745c6f29...). Round-trip verified locally: torch.load(weights_only=True) + load_state_dict(strict=True) succeed; forward(1,3,480,480) emits {'vlad_descriptor': (1, 4096) fp32} — matches NetVladStrategy contract per net_vlad.py:247-251. Two material discoveries documented in the AZ-965 spec: 1. The NetVLAD-VGG16 architecture already lives in repo at src/gps_denied_onboard/components/c2_vpr/_net_vlad_architecture.py — we instantiate it and save a state_dict, NOT externally source. 2. The PyTorch FP16 runtime expects a .pt state_dict (NOT .onnx). BackboneConfig.onnx_path is a misnomer for NetVLAD: per AZ-321 design + c2_vpr description.md §1, NetVLAD runs on PyTorch FP16 (NOT TRT). compile_engine is a no-op sha256+path wrap; deserialize_engine does torch.load(weights_only=True) + load_state_dict(strict=True). User skipped Option A/B/C/D/E question — judgment call = Option B (IMAGENET1K_V1 + random tail) per "use judgment, don't block": * Option A (Nanne translation) was 5-8 SP, above the 5 SP budget. * Option B is 3 SP, fits the budget, honestly labelled. * Option C (pure random) was borderline-dishonest per Real Results. Files: * scripts/mk_netvlad_checkpoint.py — deterministic generator. * models/netvlad/netvlad.pt — 568 MiB, via git-lfs (.gitattributes extended for models/*/.pt, .onnx, .engine). * configs/operator_replay.yaml — c2_vpr + c10_provisioning blocks populated; the field literally named onnx_path actually points at the .pt for NetVLAD per the runtime semantics noted above. * docker-compose.test.jetson.yml — ./models:/opt/models:ro bind mount added to e2e-runner. * _docs/03_ip_attribution/netvlad.md — provenance, licence, how-to- reproduce, honest scope statement ("NOT a real-retrieval checkpoint; ESKF divergence under garbage retrievals is the expected next gate"). * _docs/02_tasks/todo/AZ-965_netvlad_onnx_backbone_provisioning.md — rewritten to reflect the .pt-not-.onnx + Option B discoveries. Tier-2 verification follows in a separate commit after the harness run confirms the empty-backbones SKIP gate clears. Out of scope (filed as follow-ups): * Real-retrieval NetVLAD weights (Nanne Pittsburgh-30k translation or internal team checkpoint) — separate ticket. * AZ-840 orchestrator PASSing end-to-end (depends on retrieval quality + ESKF stability). * AZ-963 60s smoke ESKF divergence (independent chain). Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-29 18:03:32 +03:00
Oleksandr Bezdieniezhnykh	288aae881d	[AZ-964] FAISS index bootstrap for AZ-839 fixture + build flag AZ-964 SHIPPED — AZ-840 orchestrator test moves past FAISS gate. Changes: * tests/e2e/replay/_faiss_seed.py — extracts the empty HNSW32 seeding logic from scripts/mk_test_faiss_fixture.py into a reusable test-infra module: seed_empty_faiss_index(root_dir, , descriptor_dim=512, backbone_label="ultra_vpr") -> Path. scripts/mk_test_faiss_fixture.py rewritten as a thin CLI shim importing the same helper. compose `tile-init` contract is preserved. * tests/e2e/replay/conftest.py::_build_operator_pre_flight_cache now calls seed_empty_faiss_index(cache_root) immediately before build_descriptor_index(config), so the factory's _load() finds a valid .index + .sha256 + .meta.json triplet at the fixture's override root_dir. populate_c6_from_route later in the fixture rebuilds the real index once route tiles are downloaded. * docker-compose.test.jetson.yml: BUILD_PYTORCH_FP16_RUNTIME: "ON" added to e2e-runner.environment. Scope creep documented honestly in the spec — Tier-2 surfaced this third config gap on the same fixture chain while validating AZ-964 (RuntimeNotAvailableError: ... the flag is OFF). One-line wiring; the dustynv/l4t-pytorch base image bakes the Tegra-tuned PyTorch wheel and pytorch_fp16_runtime.py exists, so flag flip is sufficient. Tier-2 verdict (4F / 48P / 3S / 1XF / 1XP in 86.07s, 0 errors — was 2 errors before this commit): AZ-840 orchestrator test moves from ERROR at FAISS gate to SKIP at empty-backbones gate — exactly the AZ-965 gate AZ-964 AC-3 promised. test_operator_pre_flight_ integration SKIPs cleanly too. The 4 derkachi_1min ESKF-divergence FAILs are constant across all three runs today (AZ-963 path, independent of orchestrator chain). Three Tier-2 runs today on the orchestrator chain: i. pre-AZ-962: SKIP at env-var gate ii. post-AZ-962: ERROR at FAISS gate iii. post-AZ-964: SKIP at backbones gate (AZ-965) Cycle-4 e2e gate still NOT GREEN. Orchestrator chain remaining = AZ-965 (NetVLAD backbone provisioning); 60s smoke chain remaining = AZ-963 (ESKF divergence). OKVIS2 deferral directive unchanged. Pre-existing yamllint false positive on docker-compose.test.jetson .yml:185 (sibling `volumes:` keys flagged as duplicates without respecting parent-key scope) — PyYAML parses cleanly with no duplicates and docker-compose accepts the file at runtime. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-29 17:02:49 +03:00
Oleksandr Bezdieniezhnykh	763d8b21ad	[AZ-962] [AZ-964] [AZ-965] operator_replay.yaml + Tier-2 wiring AZ-962 SHIPPED — Tier-2 Jetson AZ-840 orchestrator test no longer SKIPs at the env-var gate. configs/operator_replay.yaml registers c6/c7/c10/c11 with sane defaults (backbones intentionally empty, see AZ-965); docker-compose.test.jetson.yml exports GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml and bind-mounts ./configs:/opt/configs:ro. ENV_KEY_MAP gains SATELLITE_PROVIDER_URL → c11_tile_manager.satellite_provider_url and SATELLITE_PROVIDER_API_KEY → c11_tile_manager.service_api_key so secrets flow from .env.test and never sit in YAML. README drops the manual export step. 97/97 c11 + config unit tests stay green. Tier-2 re-run (4 failed / 48 passed / 1 skipped / 1 xfailed / 1 xpassed / 2 errors in 84.99s vs baseline 3 skipped — i.e. -2 skipped, +2 errors): AZ-840 orchestrator test moves from SKIP to ERROR with a deeper, real gate — IndexUnavailableError on FaissDescriptorIndex against a fresh c6_tile_cache.root_dir. AZ-964 (3 SP, todo/) filed for FAISS index bootstrap in the AZ-839 C3 fixture. AZ-965 (3 SP, todo/, blocked by AZ-964) filed for NetVLAD ONNX backbone provisioning — the next gate the orchestrator test will hit once FAISS clears. Cycle-4 e2e gate remains NOT GREEN: AZ-840 chain is now AZ-964 → AZ-965 → PASS; 60s smoke chain is AZ-963 → PASS. OKVIS2 deferral directive (2026-05-29) unchanged — still gated behind Derkachi e2e green, still NOT MET. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-29 16:42:55 +03:00
Oleksandr Bezdieniezhnykh	38170b3499	[AZ-894] [AZ-895] e2e harnesses: enable BUILD_CSV_REPLAY_ADAPTER=ON AZ-894 added the CSV adapter behind BUILD_CSV_REPLAY_ADAPTER; AZ-895 made the (video, CSV) path the primary replay surface. The two e2e compose files (docker-compose.test.yml + docker-compose.test.jetson.yml) were never updated to set the flag, so the airborne replay binary inside the e2e-runner container hit FcAdapterConfigError as soon as the composition root tried to construct CsvReplayFcAdapter. Caught by a Jetson harness run (5 failures, all in tests/e2e/replay/test_derkachi_1min.py, all with the same stack and the same root cause). After this fix the Jetson run drops to 4 failures, all sharing the AZ-848 ESKF-divergence root cause — handled in the follow-up commit. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-26 22:52:18 +03:00
Oleksandr Bezdieniezhnykh	811b04e605	[AZ-777] Phase 1: wire e2e-runner to real satellite-provider + C11 contract adapt Adapt C11 HttpTileDownloader to the AZ-505 v1.0.0 tile-inventory contract (POST /api/satellite/tiles/inventory + GET /tiles/{z}/{x}/{y}) and wire the Jetson e2e harness against the real parent-suite satellite-provider service. Closes Phase 1 of 5 for AZ-777; STOP gate before Phase 2 (Derkachi catalog seed). C11 changes: - _LIST_PATH / _GET_PATH replaced with _INVENTORY_PATH + _TILES_PATH. - _do_enumerate enumerates bbox tile coords client-side and posts chunked inventory requests (5000-entry cap per the contract). - _download_one_tile parses tile_id_str into (z,x,y) and fetches the slippy-map URL. - Common GET / POST retry+auth ladder consolidated into _send_request. - New module helpers: _enumerate_bbox_tile_coords, _tile_center_latlon, _tile_size_meters_at, _format_tile_id_str, _parse_tile_id_str, _chunk_iter. - _DEFAULT_ESTIMATED_TILE_BYTES (50 KiB) replaces the inventory-side estimatedBytes field the v1.0.0 contract dropped. Tests: - 14/14 unit tests in tests/unit/c11_tile_manager/test_tile_downloader.py rewritten for the new POST inventory + slippy-map GET handler. _StubTileWriter rekeyed by call-index (the downloader now derives lat/lon from the slippy-map coord, so fixtures can't fabricate arbitrary positions). - New Tier-2 smoke at tests/e2e/satellite_provider/test_smoke.py: validates inventory POST schema + drives HttpTileDownloader against the real service. Gated by RUN_REPLAY_E2E=1 + tier2. Compose / env: - e2e-runner SATELLITE_PROVIDER_URL switched from mock-sat:5100 to https://satellite-provider:8080; TLS_INSECURE + Bearer JWT env + depends_on satellite-provider added. - .env.test.example documents SATELLITE_PROVIDER_API_KEY + dev TLS bypass security note. - scripts/mint_dev_jwt.py mints HS256 dev JWTs from env / .env.test. - pyjwt added to dev extras. Tracker hygiene: - AZ-777 row in _dependencies_table.md bumped 5pt -> 8pt to match the 2026-05-21 override decision log. Code review: PASS_WITH_WARNINGS (3 medium/low findings, all deferred to later AZ-777 phases) -- see batch_104_review.md. Batch report at batch_104_cycle3_report.md. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-21 14:52:39 +03:00
Oleksandr Bezdieniezhnykh	21a7784682	[AZ-701] Fix Jetson e2e harness infrastructure blockers - gtsam_isam2_estimator: shim for gtsam>=4.3a0 aarch64 pre-release where IncrementalFixedLagSmoother/FixedLagSmootherKeyTimestampMap moved from gtsam_unstable to gtsam - inference_factory: eager import of c7_inference package so register_component_block runs before config.components is read - docker-compose.test.jetson.yml: remove companion and operator-orchestrator (not needed by replay CLI tests and crash in test env due to AZ-618 live-mode deps); add db-migrate and tile-init setup-profile services for Alembic migrations and FAISS fixture provisioning; update e2e-runner depends_on to db only - scripts/mk_test_faiss_fixture.py: generate minimal HNSW32 FAISS descriptor index into the tile-data volume for the test harness Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-20 19:01:36 +03:00
Oleksandr Bezdieniezhnykh	a7b3e60716	[autodev] Update Jetson test environment and satellite-provider integration ci/woodpecker/push/02-build-push Pipeline failed Details - Added `.env.test` to `.gitignore` to exclude test environment variables. - Enhanced `docker-compose.test.jetson.yml` to include the real satellite-provider .NET service and its PostgreSQL database, replacing the mock service. - Updated test execution policy to mandate all tests run exclusively on Jetson hardware, deprecating the previous two-tier model. - Revised documentation in `_docs/LESSONS.md`, `_docs/02_document/tests/environment.md`, and `_docs/04_deploy/ci_cd_pipeline.md` to reflect the new testing strategy and environment setup. - Improved `run-tests-jetson.sh` script to ensure proper environment variable handling and satellite-provider integration. This commit aligns the testing framework with production environments, enhancing reliability and coverage.	2026-05-20 13:22:51 +03:00
Oleksandr Bezdieniezhnykh	324bbd6367	[AZ-602] e2e compose: set all three replay BUILD_* flags REPLAY_BUILD_FLAGS contains three names but the test compose files only ever set BUILD_REPLAY_SINK_JSONL. Every prior Reality-Gate run hit the auto-sync hard-fail before reaching the VideoFileFrameSource or TlogReplayFcAdapter build-flag gates, so the omission stayed hidden. AZ-611 makes tests bypass auto-sync, which exposes the next gate: VideoFileFrameSource raises FrameSourceConfigError ("BUILD_VIDEO_FILE_FRAME_SOURCE is OFF; ... unavailable"). Mirror the airborne binary's flag requirements in both docker-compose.test.yml (Colima Tier-1) and docker-compose.test.jetson.yml (Jetson Tier-2). Comment block in both files documents why all three must be ON. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-18 09:04:35 +03:00
Oleksandr Bezdieniezhnykh	9c13ab3bd0	[AZ-615] [AZ-617] Add Jetson e2e harness + tier2 marks C7 inference (PytorchFp16Runtime / TensorRTRuntime / OnnxTrtEpRuntime) is CUDA-only by design — `model.half().cuda()` is hard-wired with no CPU fallback. The Colima/Tier-1 smoke harness can never exercise C3 matcher or C7 inference. Once AZ-614 fixes the tlog time-base mismatch and the pipeline reaches those stages, Colima runs would hard-fail at `.cuda()` instead of cleanly skipping. This commit lays down the Jetson companion harness and wires the existing `tier2` auto-skip: * tests/e2e/Dockerfile.jetson — l4t-pytorch:r36.4.0-pth2.3-py3 base, same /opt layout as the Colima image so AC-4 AST scan + bind mounts work identically. Built ON the Jetson via run-tests-jetson.sh. * docker-compose.test.jetson.yml — mirrors docker-compose.test.yml but with `runtime: nvidia`, GPU device exposure, and GPS_DENIED_TIER=2 (turns OFF the tier2 auto-skip). * scripts/run-tests-jetson.sh — rsync → ssh build → ssh up, exit-code-from e2e-runner so the local exit code reflects the remote test verdict. No credentials in the repo; uses `ssh jetson-e2e` alias resolved via ~/.ssh/config. * _docs/03_implementation/jetson_harness_setup.md — one-time SSH key + alias + sshd hardening + GPU verification steps. Documents the smoke vs. Reality Gate split + the GPS_DENIED_TIER switch. AZ-617 (mark heavy ACs with tier2): adds @pytest.mark.tier2 to AC-1, AC-2, AC-3, AC-5, AC-6 in tests/e2e/replay/test_derkachi_1min.py. Reuses the existing tier2 marker + auto-skip in tests/conftest.py (scope revision documented as a comment on AZ-617). AC-4a/4b/AC-7/AC-9 stay unmarked — they don't touch CUDA. Defers to follow-up Jira: * AZ-614 — Derkachi tlog synth time-base mismatch (unblocks tier2 ACs actually reaching the GPU stage on the Jetson) * AZ-616 — replace mock-sat with real ../satellite-provider service Not run yet: the harness needs operator-side SSH setup to come online before scripts/run-tests-jetson.sh can be executed end-to-end. Setup steps documented in jetson_harness_setup.md. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-18 01:57:23 +03:00

9 Commits