mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 00:11:13 +00:00
[AZ-306] C6 FaissDescriptorIndex (faiss-cpu, HNSW32)
Production-default DescriptorIndex strategy backed by the faiss-cpu PyPI wheel (>=1.7,<2.0). Implements the AZ-303 Protocol surface end to end: HNSW32 + IndexIDMap2 search, atomic three-file rebuild (.index + .sha256 sidecar + .meta.json), triple-consistency load check, mmap-backed reads with IO_FLAG_MMAP|IO_FLAG_READ_ONLY, optional warm-up query at construction, FAISS RuntimeError rewrap to IndexUnavailableError / IndexBuildError, and FaissDescriptorIndex.from_config classmethod wired into runtime_root.storage_factory. The original spec required a custom pybind11 wrapper over a vendored FAISS HEAD; the user opted for the upstream faiss-cpu wheel after research fact #92 confirmed ARM64 wheel availability for Jetson and the existing pyproject.toml already pinned faiss-cpu. cpp/faiss_index/ placeholder removed; BUILD_FAISS_INDEX flag retained as a runtime/factory gate (no native target). Spec rewritten end-to-end and archived to _docs/02_tasks/done/. C6TileCacheConfig extended with faiss_index_path and faiss_warmup_query_path fields. tests/conftest.py sets KMP_DUPLICATE_LIB_OK=TRUE to remediate the macOS faiss/torch libomp duplicate-load abort during pytest (no-op on CI Linux). 21 new tests cover AC-1..12 + 2 NFRs + from_config smoke; AZ-303 protocol-conformance fake updated with from_config classmethod. Tests: 124/124 c6_tile_cache pass; 1334 project-wide pass; 1 pre-existing OKVIS2 submodule failure unrelated. Doc sync: module-layout.md, components/08_c6_tile_cache/description.md §5, batch_35_cycle1_report.md. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,175 @@
|
||||
# Batch 35 / Cycle 1 — Implementation Report
|
||||
|
||||
**Date**: 2026-05-13
|
||||
**Tasks**: AZ-306 (C6 FaissDescriptorIndex)
|
||||
**Story points landed**: 5
|
||||
**Status**: complete (AZ-306 → In Testing)
|
||||
|
||||
## Scope summary
|
||||
|
||||
Single-task batch landing the production-default `DescriptorIndex`
|
||||
strategy for C6 — closing the gap left open by the AZ-303 protocol
|
||||
contract (which only shipped the Protocol + factory) and unblocking
|
||||
AZ-322 (C10 Descriptor Batcher), AZ-341 (C2 FAISS HNSW Retrieve
|
||||
Wiring), and downstream c2_vpr/c2_5_rerank/c3_matcher tile-cache
|
||||
consumers.
|
||||
|
||||
### AZ-306 — C6 FaissDescriptorIndex (faiss-cpu, HNSW32)
|
||||
|
||||
**Architectural change vs. original spec.** The original task
|
||||
description called for a custom pybind11 wrapper over a
|
||||
`cpp/faiss_index/` vendored FAISS HEAD. During Step 0 of the implement
|
||||
skill the spec was cross-checked against three existing project
|
||||
artifacts:
|
||||
|
||||
1. `_docs/00_research/02_fact_cards/C6_tile_cache_spatial_index.md`
|
||||
fact #92 documents that `faiss-cpu` publishes ARM64 wheels for the
|
||||
project's Jetson runtime.
|
||||
2. `pyproject.toml` already carried `"faiss-cpu>=1.7,<2.0"` in the
|
||||
`[indexing]` extras group — i.e. the wheel was the planned
|
||||
acquisition path all along.
|
||||
3. `cpp/faiss_index/CMakeLists.txt` was a 6-line placeholder with no
|
||||
real source; no FAISS HEAD vendor existed in the tree.
|
||||
|
||||
The contradiction was surfaced to the user (decision required —
|
||||
Option A vs. B vs. C). The user chose Option A: drop the custom
|
||||
pybind11 wrapper and use the upstream `faiss-cpu` PyPI wheel
|
||||
directly.
|
||||
|
||||
The `BUILD_FAISS_INDEX` flag was preserved as a runtime/factory gate
|
||||
consumed by `runtime_root.storage_factory.build_descriptor_index`;
|
||||
it no longer maps to a CMake build target.
|
||||
|
||||
**Implementation.** Pure-Python `FaissDescriptorIndex` at
|
||||
`src/gps_denied_onboard/components/c6_tile_cache/faiss_descriptor_index.py`
|
||||
implementing the AZ-303 `DescriptorIndex` Protocol surface end-to-end:
|
||||
|
||||
- **Search** — `IndexHNSWFlat(M=32) + IndexIDMap2`, `efSearch`
|
||||
applied to the wrapped HNSW at load time. `search_topk` validates
|
||||
query dtype/shape/contiguity, rewraps any FAISS `RuntimeError` as
|
||||
`IndexUnavailableError`, and surfaces a corrupt int64↔TileId
|
||||
mapping as `IndexUnavailableError` rather than a silent KeyError.
|
||||
- **Rebuild** — atomic three-file write under `Sha256Sidecar`:
|
||||
`<index>` first (via `faiss.write_index` to a `.tmp`, then
|
||||
`Sha256Sidecar.write_atomic_and_sidecar`), `<index>.meta.json`
|
||||
second (typed `IndexMetadata` + tile-id mapping). Any failure mid
|
||||
flight raises `IndexBuildError`; the prior on-disk index + sidecar
|
||||
+ meta tuple stays intact (AC-4).
|
||||
- **Load** — triple-consistency check: `sha256(.index)` ==
|
||||
`.sha256` sidecar text == `meta.json::sidecar_sha256_hex`. Any
|
||||
divergence raises `IndexUnavailableError`. Index opened with
|
||||
`faiss.read_index(IO_FLAG_MMAP | IO_FLAG_READ_ONLY)`.
|
||||
- **Warm-up** — optional `warmup_query` argument (numpy float32
|
||||
vector loaded from `faiss_warmup_query_path`). Runs one
|
||||
`search_topk(k=1)` at construction so the first F3 frame doesn't
|
||||
pay the page-in cost (AC-8 / AC-NEW-1).
|
||||
- **Composition root entry** — `FaissDescriptorIndex.from_config(config)`
|
||||
classmethod mirrors `PostgresFilesystemStore.from_config` and is
|
||||
wired into `runtime_root.storage_factory.build_descriptor_index`.
|
||||
- **Int64-id mapping** — `int.from_bytes(sha256(f"{zoom}|{lat:.8f}|
|
||||
{lon:.8f}").digest()[:8], "big", signed=True)` with explicit
|
||||
collision detection at rebuild time (AC-11). Tile `source` field
|
||||
intentionally NOT in the hash input — a tile is identified by
|
||||
position, not by feed.
|
||||
|
||||
**C6TileCacheConfig** extended with `faiss_index_path` and
|
||||
`faiss_warmup_query_path`. When `faiss_index_path` is empty, the
|
||||
factory defaults to `<root_dir>/descriptor.index`.
|
||||
|
||||
**Tests.** 21 tests in
|
||||
`tests/unit/c6_tile_cache/test_faiss_descriptor_index.py` covering
|
||||
AC-1 through AC-12 plus NFR-perf-rebuild + NFR-reliability-rewrap
|
||||
plus `from_config` smoke and module-import-clean. Tests use the real
|
||||
`faiss-cpu` dep — no fake-FAISS shim. Two long-running benchmarks
|
||||
(AC-9 search latency + NFR-perf rebuild on 100k vectors) marked
|
||||
`@pytest.mark.slow` and deferred to CI.
|
||||
|
||||
**Spec rewrite.** `_docs/02_tasks/done/AZ-306_c6_faiss_descriptor_index.md`
|
||||
was rewritten end-to-end: title, Description, Outcome, Scope, AC-10
|
||||
(now factory-gate semantics, not module-import semantics), NFR
|
||||
language, Constraints (faiss-cpu pin), Risks (wheel availability +
|
||||
mid-flight rename + int64 collision + first-query cold latency), and
|
||||
Runtime Completeness section.
|
||||
|
||||
**Doc sync.**
|
||||
|
||||
- `_docs/02_document/module-layout.md` — internal-files list updated
|
||||
to name `faiss_descriptor_index.py`; `Owns` no longer includes
|
||||
`cpp/faiss_index/**`; Build-Time Exclusion Map row for
|
||||
`BUILD_FAISS_INDEX` updated to "runtime gate at storage_factory,
|
||||
no native target"; Layout Rule 4's `_native/` callout left intact
|
||||
(still applies to other components).
|
||||
- `_docs/02_document/components/08_c6_tile_cache/description.md` § 5
|
||||
Key Dependencies row for FAISS rewritten: `faiss-cpu` PyPI wheel
|
||||
`>=1.7,<2.0`.
|
||||
- `cpp/CMakeLists.txt` — `add_subdirectory(faiss_index)` block
|
||||
replaced with an explanatory comment; the `cpp/faiss_index/`
|
||||
directory and `cpp/faiss_index/CMakeLists.txt` placeholder
|
||||
removed.
|
||||
- `cmake/build_options.cmake` — `BUILD_FAISS_INDEX` option help
|
||||
text rewritten to clarify it's a runtime gate.
|
||||
- `src/gps_denied_onboard/components/c6_tile_cache/_native/`
|
||||
placeholder removed.
|
||||
|
||||
**Environmental fix.** On macOS dev hosts, `faiss-cpu` and `torch`
|
||||
each ship their own copy of `libomp`; loading both into the same
|
||||
pytest process triggered `OMP: Error #15` and an `add_with_ids`
|
||||
abort. Added `os.environ.setdefault("KMP_DUPLICATE_LIB_OK", "TRUE")`
|
||||
at the top of `tests/conftest.py` (BEFORE any other import) — the
|
||||
documented Intel OpenMP "duplicate-load tolerated" remediation.
|
||||
`setdefault` keeps it a no-op on CI Linux where the LLVM `libomp`
|
||||
loader is shared correctly.
|
||||
|
||||
## Test results
|
||||
|
||||
- **c6_tile_cache scope**: 124 passed, 57 skipped (Docker-required
|
||||
Postgres tests), 2 deselected (`@slow` benchmarks).
|
||||
- **AZ-303 protocol conformance**: 51/51 pass — confirms the
|
||||
`DescriptorIndex` factory dispatch via `from_config` does not
|
||||
regress the existing fake-FAISS path.
|
||||
- **Full project (non-slow)**: 1334 passed, 79 skipped, 1 pre-existing
|
||||
failure.
|
||||
|
||||
The pre-existing failure is `tests/unit/test_ac1_scaffold_layout.py
|
||||
::test_cmake_files_configure` — the OKVIS2 git submodule
|
||||
(`cpp/okvis2/upstream/external/opengv/`) is not initialized in this
|
||||
dev environment. Verified pre-existing via `git stash` diff.
|
||||
Unrelated to AZ-306.
|
||||
|
||||
## Decisions ledger
|
||||
|
||||
| Decision | Rationale | Recorded in |
|
||||
|---|---|---|
|
||||
| `faiss-cpu` PyPI wheel over custom pybind11 wrapper | research fact #92 + ARM64 wheel availability + zero loss of capability vs. saving ~200 LOC of SWIG/pybind11 boilerplate | spec rewrite + Jira AZ-306 comment |
|
||||
| `BUILD_FAISS_INDEX` retained as runtime/factory gate | the flag is referenced by airborne/research/operator/replay binary matrices in `module-layout.md`; preserving it keeps the build-time exclusion table semantically meaningful | spec AC-10 |
|
||||
| `tile_id_to_int64` excludes `source` from hash input | a tile is identified spatially; same lat/lon from different feeds is logically the same tile from the index's perspective | impl docstring + spec Constraints |
|
||||
| Triple-consistency load check (`.index` ↔ `.sha256` ↔
|
||||
`meta.sidecar_sha256_hex`) | catches any out-of-band rename or
|
||||
partial rebuild as a hard `IndexUnavailableError` rather than a
|
||||
silent stale read | impl `_load` + spec NFR-reliability |
|
||||
| `KMP_DUPLICATE_LIB_OK=TRUE` set in `tests/conftest.py` | macOS-only
|
||||
dev-host issue; Intel-documented remediation; `setdefault` keeps
|
||||
it no-op on CI | conftest comment |
|
||||
|
||||
## Leftovers
|
||||
|
||||
None added. The only known dev-environment leftover, D-CROSS-CVE-1
|
||||
(opencv-python 4.12 vs gtsam numpy-1.x), remains unchanged and
|
||||
deferred per `_docs/_process_leftovers/2026-05-11_d_cross_cve_1_*`.
|
||||
|
||||
## Files changed
|
||||
|
||||
- `src/gps_denied_onboard/components/c6_tile_cache/faiss_descriptor_index.py` — created (~480 LOC)
|
||||
- `tests/unit/c6_tile_cache/test_faiss_descriptor_index.py` — created (21 tests)
|
||||
- `tests/unit/c6_tile_cache/test_protocol_conformance.py` — `_FakeFaissDescriptorIndex.from_config` classmethod added
|
||||
- `src/gps_denied_onboard/components/c6_tile_cache/config.py` — `faiss_index_path` + `faiss_warmup_query_path` fields
|
||||
- `src/gps_denied_onboard/runtime_root/storage_factory.py` — switched to `FaissDescriptorIndex.from_config(config)`
|
||||
- `pyproject.toml` — promoted `faiss-cpu>=1.7,<2.0` to main deps
|
||||
- `tests/conftest.py` — `KMP_DUPLICATE_LIB_OK=TRUE` set early
|
||||
- `cpp/CMakeLists.txt` — `add_subdirectory(faiss_index)` removed
|
||||
- `cmake/build_options.cmake` — `BUILD_FAISS_INDEX` help text updated
|
||||
- `_docs/02_document/module-layout.md` — c6 internal files + Owns + BUILD_FAISS_INDEX row
|
||||
- `_docs/02_document/components/08_c6_tile_cache/description.md` — § 5 dependency table
|
||||
- `_docs/02_tasks/todo/AZ-306_…` → `_docs/02_tasks/done/AZ-306_…` — archived
|
||||
- `cpp/faiss_index/CMakeLists.txt` — deleted
|
||||
- `src/gps_denied_onboard/components/c6_tile_cache/_native/__init__.py` — deleted
|
||||
Reference in New Issue
Block a user