Production-default DescriptorIndex strategy backed by the faiss-cpu PyPI wheel (>=1.7,<2.0). Implements the AZ-303 Protocol surface end to end: HNSW32 + IndexIDMap2 search, atomic three-file rebuild (.index + .sha256 sidecar + .meta.json), triple-consistency load check, mmap-backed reads with IO_FLAG_MMAP|IO_FLAG_READ_ONLY, optional warm-up query at construction, FAISS RuntimeError rewrap to IndexUnavailableError / IndexBuildError, and FaissDescriptorIndex.from_config classmethod wired into runtime_root.storage_factory. The original spec required a custom pybind11 wrapper over a vendored FAISS HEAD; the user opted for the upstream faiss-cpu wheel after research fact #92 confirmed ARM64 wheel availability for Jetson and the existing pyproject.toml already pinned faiss-cpu. cpp/faiss_index/ placeholder removed; BUILD_FAISS_INDEX flag retained as a runtime/factory gate (no native target). Spec rewritten end-to-end and archived to _docs/02_tasks/done/. C6TileCacheConfig extended with faiss_index_path and faiss_warmup_query_path fields. tests/conftest.py sets KMP_DUPLICATE_LIB_OK=TRUE to remediate the macOS faiss/torch libomp duplicate-load abort during pytest (no-op on CI Linux). 21 new tests cover AC-1..12 + 2 NFRs + from_config smoke; AZ-303 protocol-conformance fake updated with from_config classmethod. Tests: 124/124 c6_tile_cache pass; 1334 project-wide pass; 1 pre-existing OKVIS2 submodule failure unrelated. Doc sync: module-layout.md, components/08_c6_tile_cache/description.md §5, batch_35_cycle1_report.md. Co-authored-by: Cursor <cursoragent@cursor.com>
9.0 KiB
Batch 35 / Cycle 1 — Implementation Report
Date: 2026-05-13 Tasks: AZ-306 (C6 FaissDescriptorIndex) Story points landed: 5 Status: complete (AZ-306 → In Testing)
Scope summary
Single-task batch landing the production-default DescriptorIndex
strategy for C6 — closing the gap left open by the AZ-303 protocol
contract (which only shipped the Protocol + factory) and unblocking
AZ-322 (C10 Descriptor Batcher), AZ-341 (C2 FAISS HNSW Retrieve
Wiring), and downstream c2_vpr/c2_5_rerank/c3_matcher tile-cache
consumers.
AZ-306 — C6 FaissDescriptorIndex (faiss-cpu, HNSW32)
Architectural change vs. original spec. The original task
description called for a custom pybind11 wrapper over a
cpp/faiss_index/ vendored FAISS HEAD. During Step 0 of the implement
skill the spec was cross-checked against three existing project
artifacts:
_docs/00_research/02_fact_cards/C6_tile_cache_spatial_index.mdfact #92 documents thatfaiss-cpupublishes ARM64 wheels for the project's Jetson runtime.pyproject.tomlalready carried"faiss-cpu>=1.7,<2.0"in the[indexing]extras group — i.e. the wheel was the planned acquisition path all along.cpp/faiss_index/CMakeLists.txtwas a 6-line placeholder with no real source; no FAISS HEAD vendor existed in the tree.
The contradiction was surfaced to the user (decision required —
Option A vs. B vs. C). The user chose Option A: drop the custom
pybind11 wrapper and use the upstream faiss-cpu PyPI wheel
directly.
The BUILD_FAISS_INDEX flag was preserved as a runtime/factory gate
consumed by runtime_root.storage_factory.build_descriptor_index;
it no longer maps to a CMake build target.
Implementation. Pure-Python FaissDescriptorIndex at
src/gps_denied_onboard/components/c6_tile_cache/faiss_descriptor_index.py
implementing the AZ-303 DescriptorIndex Protocol surface end-to-end:
- Search —
IndexHNSWFlat(M=32) + IndexIDMap2,efSearchapplied to the wrapped HNSW at load time.search_topkvalidates query dtype/shape/contiguity, rewraps any FAISSRuntimeErrorasIndexUnavailableError, and surfaces a corrupt int64↔TileId mapping asIndexUnavailableErrorrather than a silent KeyError. - Rebuild — atomic three-file write under
Sha256Sidecar:<index>first (viafaiss.write_indexto a.tmp, thenSha256Sidecar.write_atomic_and_sidecar),<index>.meta.jsonsecond (typedIndexMetadata+ tile-id mapping). Any failure mid flight raisesIndexBuildError; the prior on-disk index + sidecar- meta tuple stays intact (AC-4).
- Load — triple-consistency check:
sha256(.index)==.sha256sidecar text ==meta.json::sidecar_sha256_hex. Any divergence raisesIndexUnavailableError. Index opened withfaiss.read_index(IO_FLAG_MMAP | IO_FLAG_READ_ONLY). - Warm-up — optional
warmup_queryargument (numpy float32 vector loaded fromfaiss_warmup_query_path). Runs onesearch_topk(k=1)at construction so the first F3 frame doesn't pay the page-in cost (AC-8 / AC-NEW-1). - Composition root entry —
FaissDescriptorIndex.from_config(config)classmethod mirrorsPostgresFilesystemStore.from_configand is wired intoruntime_root.storage_factory.build_descriptor_index. - Int64-id mapping —
int.from_bytes(sha256(f"{zoom}|{lat:.8f}| {lon:.8f}").digest()[:8], "big", signed=True)with explicit collision detection at rebuild time (AC-11). Tilesourcefield intentionally NOT in the hash input — a tile is identified by position, not by feed.
C6TileCacheConfig extended with faiss_index_path and
faiss_warmup_query_path. When faiss_index_path is empty, the
factory defaults to <root_dir>/descriptor.index.
Tests. 21 tests in
tests/unit/c6_tile_cache/test_faiss_descriptor_index.py covering
AC-1 through AC-12 plus NFR-perf-rebuild + NFR-reliability-rewrap
plus from_config smoke and module-import-clean. Tests use the real
faiss-cpu dep — no fake-FAISS shim. Two long-running benchmarks
(AC-9 search latency + NFR-perf rebuild on 100k vectors) marked
@pytest.mark.slow and deferred to CI.
Spec rewrite. _docs/02_tasks/done/AZ-306_c6_faiss_descriptor_index.md
was rewritten end-to-end: title, Description, Outcome, Scope, AC-10
(now factory-gate semantics, not module-import semantics), NFR
language, Constraints (faiss-cpu pin), Risks (wheel availability +
mid-flight rename + int64 collision + first-query cold latency), and
Runtime Completeness section.
Doc sync.
_docs/02_document/module-layout.md— internal-files list updated to namefaiss_descriptor_index.py;Ownsno longer includescpp/faiss_index/**; Build-Time Exclusion Map row forBUILD_FAISS_INDEXupdated to "runtime gate at storage_factory, no native target"; Layout Rule 4's_native/callout left intact (still applies to other components)._docs/02_document/components/08_c6_tile_cache/description.md§ 5 Key Dependencies row for FAISS rewritten:faiss-cpuPyPI wheel>=1.7,<2.0.cpp/CMakeLists.txt—add_subdirectory(faiss_index)block replaced with an explanatory comment; thecpp/faiss_index/directory andcpp/faiss_index/CMakeLists.txtplaceholder removed.cmake/build_options.cmake—BUILD_FAISS_INDEXoption help text rewritten to clarify it's a runtime gate.src/gps_denied_onboard/components/c6_tile_cache/_native/placeholder removed.
Environmental fix. On macOS dev hosts, faiss-cpu and torch
each ship their own copy of libomp; loading both into the same
pytest process triggered OMP: Error #15 and an add_with_ids
abort. Added os.environ.setdefault("KMP_DUPLICATE_LIB_OK", "TRUE")
at the top of tests/conftest.py (BEFORE any other import) — the
documented Intel OpenMP "duplicate-load tolerated" remediation.
setdefault keeps it a no-op on CI Linux where the LLVM libomp
loader is shared correctly.
Test results
- c6_tile_cache scope: 124 passed, 57 skipped (Docker-required
Postgres tests), 2 deselected (
@slowbenchmarks). - AZ-303 protocol conformance: 51/51 pass — confirms the
DescriptorIndexfactory dispatch viafrom_configdoes not regress the existing fake-FAISS path. - Full project (non-slow): 1334 passed, 79 skipped, 1 pre-existing failure.
The pre-existing failure is tests/unit/test_ac1_scaffold_layout.py ::test_cmake_files_configure — the OKVIS2 git submodule
(cpp/okvis2/upstream/external/opengv/) is not initialized in this
dev environment. Verified pre-existing via git stash diff.
Unrelated to AZ-306.
Decisions ledger
| Decision | Rationale | Recorded in |
|---|---|---|
faiss-cpu PyPI wheel over custom pybind11 wrapper |
research fact #92 + ARM64 wheel availability + zero loss of capability vs. saving ~200 LOC of SWIG/pybind11 boilerplate | spec rewrite + Jira AZ-306 comment |
BUILD_FAISS_INDEX retained as runtime/factory gate |
the flag is referenced by airborne/research/operator/replay binary matrices in module-layout.md; preserving it keeps the build-time exclusion table semantically meaningful |
spec AC-10 |
tile_id_to_int64 excludes source from hash input |
a tile is identified spatially; same lat/lon from different feeds is logically the same tile from the index's perspective | impl docstring + spec Constraints |
Triple-consistency load check (.index ↔ .sha256 ↔ |
||
meta.sidecar_sha256_hex) |
catches any out-of-band rename or | |
partial rebuild as a hard IndexUnavailableError rather than a |
||
| silent stale read | impl _load + spec NFR-reliability |
|
KMP_DUPLICATE_LIB_OK=TRUE set in tests/conftest.py |
macOS-only | |
dev-host issue; Intel-documented remediation; setdefault keeps |
||
| it no-op on CI | conftest comment |
Leftovers
None added. The only known dev-environment leftover, D-CROSS-CVE-1
(opencv-python 4.12 vs gtsam numpy-1.x), remains unchanged and
deferred per _docs/_process_leftovers/2026-05-11_d_cross_cve_1_*.
Files changed
src/gps_denied_onboard/components/c6_tile_cache/faiss_descriptor_index.py— created (~480 LOC)tests/unit/c6_tile_cache/test_faiss_descriptor_index.py— created (21 tests)tests/unit/c6_tile_cache/test_protocol_conformance.py—_FakeFaissDescriptorIndex.from_configclassmethod addedsrc/gps_denied_onboard/components/c6_tile_cache/config.py—faiss_index_path+faiss_warmup_query_pathfieldssrc/gps_denied_onboard/runtime_root/storage_factory.py— switched toFaissDescriptorIndex.from_config(config)pyproject.toml— promotedfaiss-cpu>=1.7,<2.0to main depstests/conftest.py—KMP_DUPLICATE_LIB_OK=TRUEset earlycpp/CMakeLists.txt—add_subdirectory(faiss_index)removedcmake/build_options.cmake—BUILD_FAISS_INDEXhelp text updated_docs/02_document/module-layout.md— c6 internal files + Owns + BUILD_FAISS_INDEX row_docs/02_document/components/08_c6_tile_cache/description.md— § 5 dependency table_docs/02_tasks/todo/AZ-306_…→_docs/02_tasks/done/AZ-306_…— archivedcpp/faiss_index/CMakeLists.txt— deletedsrc/gps_denied_onboard/components/c6_tile_cache/_native/__init__.py— deleted