Files
Oleksandr Bezdieniezhnykh 3b7265757b [AZ-306] C6 FaissDescriptorIndex (faiss-cpu, HNSW32)
Production-default DescriptorIndex strategy backed by the faiss-cpu
PyPI wheel (>=1.7,<2.0). Implements the AZ-303 Protocol surface end
to end: HNSW32 + IndexIDMap2 search, atomic three-file rebuild
(.index + .sha256 sidecar + .meta.json), triple-consistency load
check, mmap-backed reads with IO_FLAG_MMAP|IO_FLAG_READ_ONLY, optional
warm-up query at construction, FAISS RuntimeError rewrap to
IndexUnavailableError / IndexBuildError, and FaissDescriptorIndex.from_config
classmethod wired into runtime_root.storage_factory.

The original spec required a custom pybind11 wrapper over a vendored
FAISS HEAD; the user opted for the upstream faiss-cpu wheel after
research fact #92 confirmed ARM64 wheel availability for Jetson and
the existing pyproject.toml already pinned faiss-cpu. cpp/faiss_index/
placeholder removed; BUILD_FAISS_INDEX flag retained as a
runtime/factory gate (no native target). Spec rewritten end-to-end and
archived to _docs/02_tasks/done/.

C6TileCacheConfig extended with faiss_index_path and
faiss_warmup_query_path fields. tests/conftest.py sets
KMP_DUPLICATE_LIB_OK=TRUE to remediate the macOS faiss/torch libomp
duplicate-load abort during pytest (no-op on CI Linux). 21 new tests
cover AC-1..12 + 2 NFRs + from_config smoke; AZ-303 protocol-conformance
fake updated with from_config classmethod.

Tests: 124/124 c6_tile_cache pass; 1334 project-wide pass; 1
pre-existing OKVIS2 submodule failure unrelated.

Doc sync: module-layout.md, components/08_c6_tile_cache/description.md
§5, batch_35_cycle1_report.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 04:01:37 +03:00

9.0 KiB

Batch 35 / Cycle 1 — Implementation Report

Date: 2026-05-13 Tasks: AZ-306 (C6 FaissDescriptorIndex) Story points landed: 5 Status: complete (AZ-306 → In Testing)

Scope summary

Single-task batch landing the production-default DescriptorIndex strategy for C6 — closing the gap left open by the AZ-303 protocol contract (which only shipped the Protocol + factory) and unblocking AZ-322 (C10 Descriptor Batcher), AZ-341 (C2 FAISS HNSW Retrieve Wiring), and downstream c2_vpr/c2_5_rerank/c3_matcher tile-cache consumers.

AZ-306 — C6 FaissDescriptorIndex (faiss-cpu, HNSW32)

Architectural change vs. original spec. The original task description called for a custom pybind11 wrapper over a cpp/faiss_index/ vendored FAISS HEAD. During Step 0 of the implement skill the spec was cross-checked against three existing project artifacts:

  1. _docs/00_research/02_fact_cards/C6_tile_cache_spatial_index.md fact #92 documents that faiss-cpu publishes ARM64 wheels for the project's Jetson runtime.
  2. pyproject.toml already carried "faiss-cpu>=1.7,<2.0" in the [indexing] extras group — i.e. the wheel was the planned acquisition path all along.
  3. cpp/faiss_index/CMakeLists.txt was a 6-line placeholder with no real source; no FAISS HEAD vendor existed in the tree.

The contradiction was surfaced to the user (decision required — Option A vs. B vs. C). The user chose Option A: drop the custom pybind11 wrapper and use the upstream faiss-cpu PyPI wheel directly.

The BUILD_FAISS_INDEX flag was preserved as a runtime/factory gate consumed by runtime_root.storage_factory.build_descriptor_index; it no longer maps to a CMake build target.

Implementation. Pure-Python FaissDescriptorIndex at src/gps_denied_onboard/components/c6_tile_cache/faiss_descriptor_index.py implementing the AZ-303 DescriptorIndex Protocol surface end-to-end:

  • SearchIndexHNSWFlat(M=32) + IndexIDMap2, efSearch applied to the wrapped HNSW at load time. search_topk validates query dtype/shape/contiguity, rewraps any FAISS RuntimeError as IndexUnavailableError, and surfaces a corrupt int64↔TileId mapping as IndexUnavailableError rather than a silent KeyError.
  • Rebuild — atomic three-file write under Sha256Sidecar: <index> first (via faiss.write_index to a .tmp, then Sha256Sidecar.write_atomic_and_sidecar), <index>.meta.json second (typed IndexMetadata + tile-id mapping). Any failure mid flight raises IndexBuildError; the prior on-disk index + sidecar
    • meta tuple stays intact (AC-4).
  • Load — triple-consistency check: sha256(.index) == .sha256 sidecar text == meta.json::sidecar_sha256_hex. Any divergence raises IndexUnavailableError. Index opened with faiss.read_index(IO_FLAG_MMAP | IO_FLAG_READ_ONLY).
  • Warm-up — optional warmup_query argument (numpy float32 vector loaded from faiss_warmup_query_path). Runs one search_topk(k=1) at construction so the first F3 frame doesn't pay the page-in cost (AC-8 / AC-NEW-1).
  • Composition root entryFaissDescriptorIndex.from_config(config) classmethod mirrors PostgresFilesystemStore.from_config and is wired into runtime_root.storage_factory.build_descriptor_index.
  • Int64-id mappingint.from_bytes(sha256(f"{zoom}|{lat:.8f}| {lon:.8f}").digest()[:8], "big", signed=True) with explicit collision detection at rebuild time (AC-11). Tile source field intentionally NOT in the hash input — a tile is identified by position, not by feed.

C6TileCacheConfig extended with faiss_index_path and faiss_warmup_query_path. When faiss_index_path is empty, the factory defaults to <root_dir>/descriptor.index.

Tests. 21 tests in tests/unit/c6_tile_cache/test_faiss_descriptor_index.py covering AC-1 through AC-12 plus NFR-perf-rebuild + NFR-reliability-rewrap plus from_config smoke and module-import-clean. Tests use the real faiss-cpu dep — no fake-FAISS shim. Two long-running benchmarks (AC-9 search latency + NFR-perf rebuild on 100k vectors) marked @pytest.mark.slow and deferred to CI.

Spec rewrite. _docs/02_tasks/done/AZ-306_c6_faiss_descriptor_index.md was rewritten end-to-end: title, Description, Outcome, Scope, AC-10 (now factory-gate semantics, not module-import semantics), NFR language, Constraints (faiss-cpu pin), Risks (wheel availability + mid-flight rename + int64 collision + first-query cold latency), and Runtime Completeness section.

Doc sync.

  • _docs/02_document/module-layout.md — internal-files list updated to name faiss_descriptor_index.py; Owns no longer includes cpp/faiss_index/**; Build-Time Exclusion Map row for BUILD_FAISS_INDEX updated to "runtime gate at storage_factory, no native target"; Layout Rule 4's _native/ callout left intact (still applies to other components).
  • _docs/02_document/components/08_c6_tile_cache/description.md § 5 Key Dependencies row for FAISS rewritten: faiss-cpu PyPI wheel >=1.7,<2.0.
  • cpp/CMakeLists.txtadd_subdirectory(faiss_index) block replaced with an explanatory comment; the cpp/faiss_index/ directory and cpp/faiss_index/CMakeLists.txt placeholder removed.
  • cmake/build_options.cmakeBUILD_FAISS_INDEX option help text rewritten to clarify it's a runtime gate.
  • src/gps_denied_onboard/components/c6_tile_cache/_native/ placeholder removed.

Environmental fix. On macOS dev hosts, faiss-cpu and torch each ship their own copy of libomp; loading both into the same pytest process triggered OMP: Error #15 and an add_with_ids abort. Added os.environ.setdefault("KMP_DUPLICATE_LIB_OK", "TRUE") at the top of tests/conftest.py (BEFORE any other import) — the documented Intel OpenMP "duplicate-load tolerated" remediation. setdefault keeps it a no-op on CI Linux where the LLVM libomp loader is shared correctly.

Test results

  • c6_tile_cache scope: 124 passed, 57 skipped (Docker-required Postgres tests), 2 deselected (@slow benchmarks).
  • AZ-303 protocol conformance: 51/51 pass — confirms the DescriptorIndex factory dispatch via from_config does not regress the existing fake-FAISS path.
  • Full project (non-slow): 1334 passed, 79 skipped, 1 pre-existing failure.

The pre-existing failure is tests/unit/test_ac1_scaffold_layout.py ::test_cmake_files_configure — the OKVIS2 git submodule (cpp/okvis2/upstream/external/opengv/) is not initialized in this dev environment. Verified pre-existing via git stash diff. Unrelated to AZ-306.

Decisions ledger

Decision Rationale Recorded in
faiss-cpu PyPI wheel over custom pybind11 wrapper research fact #92 + ARM64 wheel availability + zero loss of capability vs. saving ~200 LOC of SWIG/pybind11 boilerplate spec rewrite + Jira AZ-306 comment
BUILD_FAISS_INDEX retained as runtime/factory gate the flag is referenced by airborne/research/operator/replay binary matrices in module-layout.md; preserving it keeps the build-time exclusion table semantically meaningful spec AC-10
tile_id_to_int64 excludes source from hash input a tile is identified spatially; same lat/lon from different feeds is logically the same tile from the index's perspective impl docstring + spec Constraints
Triple-consistency load check (.index.sha256
meta.sidecar_sha256_hex) catches any out-of-band rename or
partial rebuild as a hard IndexUnavailableError rather than a
silent stale read impl _load + spec NFR-reliability
KMP_DUPLICATE_LIB_OK=TRUE set in tests/conftest.py macOS-only
dev-host issue; Intel-documented remediation; setdefault keeps
it no-op on CI conftest comment

Leftovers

None added. The only known dev-environment leftover, D-CROSS-CVE-1 (opencv-python 4.12 vs gtsam numpy-1.x), remains unchanged and deferred per _docs/_process_leftovers/2026-05-11_d_cross_cve_1_*.

Files changed

  • src/gps_denied_onboard/components/c6_tile_cache/faiss_descriptor_index.py — created (~480 LOC)
  • tests/unit/c6_tile_cache/test_faiss_descriptor_index.py — created (21 tests)
  • tests/unit/c6_tile_cache/test_protocol_conformance.py_FakeFaissDescriptorIndex.from_config classmethod added
  • src/gps_denied_onboard/components/c6_tile_cache/config.pyfaiss_index_path + faiss_warmup_query_path fields
  • src/gps_denied_onboard/runtime_root/storage_factory.py — switched to FaissDescriptorIndex.from_config(config)
  • pyproject.toml — promoted faiss-cpu>=1.7,<2.0 to main deps
  • tests/conftest.pyKMP_DUPLICATE_LIB_OK=TRUE set early
  • cpp/CMakeLists.txtadd_subdirectory(faiss_index) removed
  • cmake/build_options.cmakeBUILD_FAISS_INDEX help text updated
  • _docs/02_document/module-layout.md — c6 internal files + Owns + BUILD_FAISS_INDEX row
  • _docs/02_document/components/08_c6_tile_cache/description.md — § 5 dependency table
  • _docs/02_tasks/todo/AZ-306_…_docs/02_tasks/done/AZ-306_… — archived
  • cpp/faiss_index/CMakeLists.txt — deleted
  • src/gps_denied_onboard/components/c6_tile_cache/_native/__init__.py — deleted