[AZ-306] C6 FaissDescriptorIndex (faiss-cpu, HNSW32)

Production-default DescriptorIndex strategy backed by the faiss-cpu
PyPI wheel (>=1.7,<2.0). Implements the AZ-303 Protocol surface end
to end: HNSW32 + IndexIDMap2 search, atomic three-file rebuild
(.index + .sha256 sidecar + .meta.json), triple-consistency load
check, mmap-backed reads with IO_FLAG_MMAP|IO_FLAG_READ_ONLY, optional
warm-up query at construction, FAISS RuntimeError rewrap to
IndexUnavailableError / IndexBuildError, and FaissDescriptorIndex.from_config
classmethod wired into runtime_root.storage_factory.

The original spec required a custom pybind11 wrapper over a vendored
FAISS HEAD; the user opted for the upstream faiss-cpu wheel after
research fact #92 confirmed ARM64 wheel availability for Jetson and
the existing pyproject.toml already pinned faiss-cpu. cpp/faiss_index/
placeholder removed; BUILD_FAISS_INDEX flag retained as a
runtime/factory gate (no native target). Spec rewritten end-to-end and
archived to _docs/02_tasks/done/.

C6TileCacheConfig extended with faiss_index_path and
faiss_warmup_query_path fields. tests/conftest.py sets
KMP_DUPLICATE_LIB_OK=TRUE to remediate the macOS faiss/torch libomp
duplicate-load abort during pytest (no-op on CI Linux). 21 new tests
cover AC-1..12 + 2 NFRs + from_config smoke; AZ-303 protocol-conformance
fake updated with from_config classmethod.

Tests: 124/124 c6_tile_cache pass; 1334 project-wide pass; 1
pre-existing OKVIS2 submodule failure unrelated.

Doc sync: module-layout.md, components/08_c6_tile_cache/description.md
§5, batch_35_cycle1_report.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-13 04:01:37 +03:00
parent ecf76d762d
commit 3b7265757b
17 changed files with 1550 additions and 87 deletions
@@ -0,0 +1,175 @@
# Batch 35 / Cycle 1 — Implementation Report
**Date**: 2026-05-13
**Tasks**: AZ-306 (C6 FaissDescriptorIndex)
**Story points landed**: 5
**Status**: complete (AZ-306 → In Testing)
## Scope summary
Single-task batch landing the production-default `DescriptorIndex`
strategy for C6 — closing the gap left open by the AZ-303 protocol
contract (which only shipped the Protocol + factory) and unblocking
AZ-322 (C10 Descriptor Batcher), AZ-341 (C2 FAISS HNSW Retrieve
Wiring), and downstream c2_vpr/c2_5_rerank/c3_matcher tile-cache
consumers.
### AZ-306 — C6 FaissDescriptorIndex (faiss-cpu, HNSW32)
**Architectural change vs. original spec.** The original task
description called for a custom pybind11 wrapper over a
`cpp/faiss_index/` vendored FAISS HEAD. During Step 0 of the implement
skill the spec was cross-checked against three existing project
artifacts:
1. `_docs/00_research/02_fact_cards/C6_tile_cache_spatial_index.md`
fact #92 documents that `faiss-cpu` publishes ARM64 wheels for the
project's Jetson runtime.
2. `pyproject.toml` already carried `"faiss-cpu>=1.7,<2.0"` in the
`[indexing]` extras group — i.e. the wheel was the planned
acquisition path all along.
3. `cpp/faiss_index/CMakeLists.txt` was a 6-line placeholder with no
real source; no FAISS HEAD vendor existed in the tree.
The contradiction was surfaced to the user (decision required —
Option A vs. B vs. C). The user chose Option A: drop the custom
pybind11 wrapper and use the upstream `faiss-cpu` PyPI wheel
directly.
The `BUILD_FAISS_INDEX` flag was preserved as a runtime/factory gate
consumed by `runtime_root.storage_factory.build_descriptor_index`;
it no longer maps to a CMake build target.
**Implementation.** Pure-Python `FaissDescriptorIndex` at
`src/gps_denied_onboard/components/c6_tile_cache/faiss_descriptor_index.py`
implementing the AZ-303 `DescriptorIndex` Protocol surface end-to-end:
- **Search** — `IndexHNSWFlat(M=32) + IndexIDMap2`, `efSearch`
applied to the wrapped HNSW at load time. `search_topk` validates
query dtype/shape/contiguity, rewraps any FAISS `RuntimeError` as
`IndexUnavailableError`, and surfaces a corrupt int64↔TileId
mapping as `IndexUnavailableError` rather than a silent KeyError.
- **Rebuild** — atomic three-file write under `Sha256Sidecar`:
`<index>` first (via `faiss.write_index` to a `.tmp`, then
`Sha256Sidecar.write_atomic_and_sidecar`), `<index>.meta.json`
second (typed `IndexMetadata` + tile-id mapping). Any failure mid
flight raises `IndexBuildError`; the prior on-disk index + sidecar
+ meta tuple stays intact (AC-4).
- **Load** — triple-consistency check: `sha256(.index)` ==
`.sha256` sidecar text == `meta.json::sidecar_sha256_hex`. Any
divergence raises `IndexUnavailableError`. Index opened with
`faiss.read_index(IO_FLAG_MMAP | IO_FLAG_READ_ONLY)`.
- **Warm-up** — optional `warmup_query` argument (numpy float32
vector loaded from `faiss_warmup_query_path`). Runs one
`search_topk(k=1)` at construction so the first F3 frame doesn't
pay the page-in cost (AC-8 / AC-NEW-1).
- **Composition root entry** — `FaissDescriptorIndex.from_config(config)`
classmethod mirrors `PostgresFilesystemStore.from_config` and is
wired into `runtime_root.storage_factory.build_descriptor_index`.
- **Int64-id mapping** — `int.from_bytes(sha256(f"{zoom}|{lat:.8f}|
{lon:.8f}").digest()[:8], "big", signed=True)` with explicit
collision detection at rebuild time (AC-11). Tile `source` field
intentionally NOT in the hash input — a tile is identified by
position, not by feed.
**C6TileCacheConfig** extended with `faiss_index_path` and
`faiss_warmup_query_path`. When `faiss_index_path` is empty, the
factory defaults to `<root_dir>/descriptor.index`.
**Tests.** 21 tests in
`tests/unit/c6_tile_cache/test_faiss_descriptor_index.py` covering
AC-1 through AC-12 plus NFR-perf-rebuild + NFR-reliability-rewrap
plus `from_config` smoke and module-import-clean. Tests use the real
`faiss-cpu` dep — no fake-FAISS shim. Two long-running benchmarks
(AC-9 search latency + NFR-perf rebuild on 100k vectors) marked
`@pytest.mark.slow` and deferred to CI.
**Spec rewrite.** `_docs/02_tasks/done/AZ-306_c6_faiss_descriptor_index.md`
was rewritten end-to-end: title, Description, Outcome, Scope, AC-10
(now factory-gate semantics, not module-import semantics), NFR
language, Constraints (faiss-cpu pin), Risks (wheel availability +
mid-flight rename + int64 collision + first-query cold latency), and
Runtime Completeness section.
**Doc sync.**
- `_docs/02_document/module-layout.md` — internal-files list updated
to name `faiss_descriptor_index.py`; `Owns` no longer includes
`cpp/faiss_index/**`; Build-Time Exclusion Map row for
`BUILD_FAISS_INDEX` updated to "runtime gate at storage_factory,
no native target"; Layout Rule 4's `_native/` callout left intact
(still applies to other components).
- `_docs/02_document/components/08_c6_tile_cache/description.md` § 5
Key Dependencies row for FAISS rewritten: `faiss-cpu` PyPI wheel
`>=1.7,<2.0`.
- `cpp/CMakeLists.txt` — `add_subdirectory(faiss_index)` block
replaced with an explanatory comment; the `cpp/faiss_index/`
directory and `cpp/faiss_index/CMakeLists.txt` placeholder
removed.
- `cmake/build_options.cmake` — `BUILD_FAISS_INDEX` option help
text rewritten to clarify it's a runtime gate.
- `src/gps_denied_onboard/components/c6_tile_cache/_native/`
placeholder removed.
**Environmental fix.** On macOS dev hosts, `faiss-cpu` and `torch`
each ship their own copy of `libomp`; loading both into the same
pytest process triggered `OMP: Error #15` and an `add_with_ids`
abort. Added `os.environ.setdefault("KMP_DUPLICATE_LIB_OK", "TRUE")`
at the top of `tests/conftest.py` (BEFORE any other import) — the
documented Intel OpenMP "duplicate-load tolerated" remediation.
`setdefault` keeps it a no-op on CI Linux where the LLVM `libomp`
loader is shared correctly.
## Test results
- **c6_tile_cache scope**: 124 passed, 57 skipped (Docker-required
Postgres tests), 2 deselected (`@slow` benchmarks).
- **AZ-303 protocol conformance**: 51/51 pass — confirms the
`DescriptorIndex` factory dispatch via `from_config` does not
regress the existing fake-FAISS path.
- **Full project (non-slow)**: 1334 passed, 79 skipped, 1 pre-existing
failure.
The pre-existing failure is `tests/unit/test_ac1_scaffold_layout.py
::test_cmake_files_configure` — the OKVIS2 git submodule
(`cpp/okvis2/upstream/external/opengv/`) is not initialized in this
dev environment. Verified pre-existing via `git stash` diff.
Unrelated to AZ-306.
## Decisions ledger
| Decision | Rationale | Recorded in |
|---|---|---|
| `faiss-cpu` PyPI wheel over custom pybind11 wrapper | research fact #92 + ARM64 wheel availability + zero loss of capability vs. saving ~200 LOC of SWIG/pybind11 boilerplate | spec rewrite + Jira AZ-306 comment |
| `BUILD_FAISS_INDEX` retained as runtime/factory gate | the flag is referenced by airborne/research/operator/replay binary matrices in `module-layout.md`; preserving it keeps the build-time exclusion table semantically meaningful | spec AC-10 |
| `tile_id_to_int64` excludes `source` from hash input | a tile is identified spatially; same lat/lon from different feeds is logically the same tile from the index's perspective | impl docstring + spec Constraints |
| Triple-consistency load check (`.index` ↔ `.sha256` ↔
`meta.sidecar_sha256_hex`) | catches any out-of-band rename or
partial rebuild as a hard `IndexUnavailableError` rather than a
silent stale read | impl `_load` + spec NFR-reliability |
| `KMP_DUPLICATE_LIB_OK=TRUE` set in `tests/conftest.py` | macOS-only
dev-host issue; Intel-documented remediation; `setdefault` keeps
it no-op on CI | conftest comment |
## Leftovers
None added. The only known dev-environment leftover, D-CROSS-CVE-1
(opencv-python 4.12 vs gtsam numpy-1.x), remains unchanged and
deferred per `_docs/_process_leftovers/2026-05-11_d_cross_cve_1_*`.
## Files changed
- `src/gps_denied_onboard/components/c6_tile_cache/faiss_descriptor_index.py` — created (~480 LOC)
- `tests/unit/c6_tile_cache/test_faiss_descriptor_index.py` — created (21 tests)
- `tests/unit/c6_tile_cache/test_protocol_conformance.py` — `_FakeFaissDescriptorIndex.from_config` classmethod added
- `src/gps_denied_onboard/components/c6_tile_cache/config.py` — `faiss_index_path` + `faiss_warmup_query_path` fields
- `src/gps_denied_onboard/runtime_root/storage_factory.py` — switched to `FaissDescriptorIndex.from_config(config)`
- `pyproject.toml` — promoted `faiss-cpu>=1.7,<2.0` to main deps
- `tests/conftest.py` — `KMP_DUPLICATE_LIB_OK=TRUE` set early
- `cpp/CMakeLists.txt` — `add_subdirectory(faiss_index)` removed
- `cmake/build_options.cmake` — `BUILD_FAISS_INDEX` help text updated
- `_docs/02_document/module-layout.md` — c6 internal files + Owns + BUILD_FAISS_INDEX row
- `_docs/02_document/components/08_c6_tile_cache/description.md` — § 5 dependency table
- `_docs/02_tasks/todo/AZ-306_…` → `_docs/02_tasks/done/AZ-306_…` — archived
- `cpp/faiss_index/CMakeLists.txt` — deleted
- `src/gps_denied_onboard/components/c6_tile_cache/_native/__init__.py` — deleted