mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 12:41:13 +00:00
[AZ-306] C6 FaissDescriptorIndex (faiss-cpu, HNSW32)
Production-default DescriptorIndex strategy backed by the faiss-cpu PyPI wheel (>=1.7,<2.0). Implements the AZ-303 Protocol surface end to end: HNSW32 + IndexIDMap2 search, atomic three-file rebuild (.index + .sha256 sidecar + .meta.json), triple-consistency load check, mmap-backed reads with IO_FLAG_MMAP|IO_FLAG_READ_ONLY, optional warm-up query at construction, FAISS RuntimeError rewrap to IndexUnavailableError / IndexBuildError, and FaissDescriptorIndex.from_config classmethod wired into runtime_root.storage_factory. The original spec required a custom pybind11 wrapper over a vendored FAISS HEAD; the user opted for the upstream faiss-cpu wheel after research fact #92 confirmed ARM64 wheel availability for Jetson and the existing pyproject.toml already pinned faiss-cpu. cpp/faiss_index/ placeholder removed; BUILD_FAISS_INDEX flag retained as a runtime/factory gate (no native target). Spec rewritten end-to-end and archived to _docs/02_tasks/done/. C6TileCacheConfig extended with faiss_index_path and faiss_warmup_query_path fields. tests/conftest.py sets KMP_DUPLICATE_LIB_OK=TRUE to remediate the macOS faiss/torch libomp duplicate-load abort during pytest (no-op on CI Linux). 21 new tests cover AC-1..12 + 2 NFRs + from_config smoke; AZ-303 protocol-conformance fake updated with from_config classmethod. Tests: 124/124 c6_tile_cache pass; 1334 project-wide pass; 1 pre-existing OKVIS2 submodule failure unrelated. Doc sync: module-layout.md, components/08_c6_tile_cache/description.md §5, batch_35_cycle1_report.md. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,237 @@
|
||||
# C6 FaissDescriptorIndex — HNSW Search + Atomic Rebuild (faiss-cpu)
|
||||
|
||||
**Task**: AZ-306_c6_faiss_descriptor_index
|
||||
**Name**: C6 FaissDescriptorIndex
|
||||
**Description**: Implement `FaissDescriptorIndex`, the production-default `DescriptorIndex` Protocol strategy. Owns the F1 pre-flight `rebuild_from_descriptors` path (atomic `.index` file write + AZ-280 SHA-256 sidecar + `.meta.json` sidecar), the F2 takeoff load (FAISS mmap-backed read via `faiss.read_index(path, IO_FLAG_MMAP | IO_FLAG_READ_ONLY)`), the F3 hot-path `search_topk` (HNSW; ≤ 5 ms p95 warm; sole runtime consumer is C2 VPR), and the `index_metadata` sidecar block. Implementation uses the upstream `faiss-cpu` PyPI wheel (research fact #92 / arch tech-stack pin) — `faiss.IndexHNSWFlat` wrapped in `faiss.IndexIDMap2` for `(TileId, int64)` mapping; no project-side C++ vendor or pybind11 wrapper.
|
||||
**Complexity**: 5 points
|
||||
**Dependencies**: AZ-303_c6_storage_interfaces, AZ-280_sha256_sidecar, AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module
|
||||
**Component**: c6_tile_cache (epic AZ-250 / E-C6)
|
||||
**Tracker**: AZ-306
|
||||
**Epic**: AZ-250 (E-C6)
|
||||
|
||||
### Document Dependencies
|
||||
|
||||
- `_docs/02_document/contracts/c6_tile_cache/descriptor_index.md` — Protocol this task implements; produced by AZ-303.
|
||||
- `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md` — atomic-write + sidecar pattern for the `.index` file.
|
||||
- `_docs/02_document/contracts/shared_config/composition_root_protocol.md` — `config.tile_cache.descriptor_index_runtime`, `config.tile_cache.faiss_index_path`, `config.tile_cache.faiss_warmup_query` fields.
|
||||
- `_docs/02_document/contracts/shared_logging/log_record_schema.md` — INFO/WARN log shapes for load + warm-up + corruption events.
|
||||
|
||||
## Problem
|
||||
|
||||
Without a real `FaissDescriptorIndex`:
|
||||
|
||||
- C2 VPR has no production retrieval path — `search_topk` is a hole; F3 hot path fails before C2.5.
|
||||
- C10 CacheProvisioner has no production index builder — F1 pre-flight cannot persist a `.index` file; takeoff blocks.
|
||||
- The F2 takeoff cold-start budget (AC-NEW-1 ≤ 12 s end-to-end) cannot be measured — without a real warm-up query, the first per-frame `search_topk` would pay the multi-second mmap page-in (description.md § 7).
|
||||
- The `IndexUnavailableError` raise points (mismatched sidecar, dimension mismatch, missing/corrupt meta) are unenforced — silent corruption is possible.
|
||||
- The `BUILD_FAISS_INDEX=OFF` factory gate has no concrete impl to negative-test against beyond the AZ-303 fake.
|
||||
|
||||
This task is the production-default impl. The Protocol, contract, and the FAISS PyPI dependency (`faiss-cpu>=1.7,<2.0`) are now ready; this is the integration point.
|
||||
|
||||
## Outcome
|
||||
|
||||
- A `FaissDescriptorIndex` class at `src/gps_denied_onboard/components/c6_tile_cache/faiss_descriptor_index.py` conforming to the `DescriptorIndex` Protocol from AZ-303.
|
||||
- Pure-Python implementation built on the upstream `faiss-cpu` PyPI wheel (`faiss.IndexHNSWFlat` + `faiss.IndexIDMap2` + `faiss.write_index` + `faiss.read_index(IO_FLAG_MMAP | IO_FLAG_READ_ONLY)`). No project-side C++ vendor, no custom pybind11 wrapper. Research fact #92 + architecture.md tech-stack pin both call out this exact dependency choice.
|
||||
- Constructor signature: `__init__(self, *, index_path: Path, sidecar: Sha256Sidecar, logger: Logger, warmup_query: np.ndarray | None = None)` — keyword-only. The composition-root convenience entry-point is `FaissDescriptorIndex.from_config(config)` (mirrors AZ-305 / `PostgresFilesystemStore.from_config`); it reads `config.components['c6_tile_cache'].faiss_index_path` + the optional `faiss_warmup_query_path` (NPY file), wires the static `Sha256Sidecar` facade and the project logger, and returns the constructed instance.
|
||||
- `search_topk(query, k) -> list[tuple[TileId, float]]`:
|
||||
1. Validates `query.shape == (descriptor_dim,)`, `query.dtype == np.float32`, `query.flags.c_contiguous`. Mismatch → `IndexUnavailableError` (per `descriptor_index.md` § I-3).
|
||||
2. Calls `self._index.search(query.reshape(1, -1), k)` and receives `(D, I)`.
|
||||
3. Maps every non-`-1` int64 id back to `TileId` via the in-memory `_id_to_tile_id` map (built at load time from the `.meta.json` sidecar) — a missing id is a corruption signal → `IndexUnavailableError`.
|
||||
4. Returns up to `k` `(TileId, float)` pairs ordered by ascending distance. Fewer-than-k results are tolerated per § I-2 (the `-1` sentinel is filtered out).
|
||||
- `descriptor_dim() -> int`: returns the cached `IndexMetadata.descriptor_dim` from load time. Constant-time.
|
||||
- `mmap_handle() -> Path`: returns the `index_path` constructor arg. Raises `IndexUnavailableError` if the index is not currently loaded (e.g., construction was attempted but the loader raised and the caller swallowed it).
|
||||
- `rebuild_from_descriptors(descriptors, tile_ids, hnsw_params) -> None`:
|
||||
1. Validates `descriptors.shape == (len(tile_ids), descriptor_dim)` (when an existing dim is loaded, OR derives `descriptor_dim` from `descriptors.shape[1]` on first build), `descriptors.dtype == np.float32`, `descriptors.flags.c_contiguous`, `len(tile_ids) > 0`. Mismatch → `IndexBuildError`.
|
||||
2. Computes deterministic int64 ids: `_int64_id_for_tile(tile_id) = int.from_bytes(sha256(f"{zoom}|{lat}|{lon}").digest()[:8], "big", signed=True)`. Detects collisions across the input `tile_ids` and raises `IndexBuildError` naming both colliding ids on collision.
|
||||
3. Builds `IndexIDMap2(IndexHNSWFlat(d, hnsw_params.m, METRIC_L2|METRIC_INNER_PRODUCT))` with `efConstruction = hnsw_params.ef_construction`, then `add_with_ids(descriptors, ids)`. Sets `efSearch = hnsw_params.ef_search` on the inner HNSW.
|
||||
4. Serialises to a sibling temp path via `faiss.write_index(idx, temp_path)`.
|
||||
5. Computes `sha256(temp_path bytes)`; composes the `IndexMetadata` JSON (`descriptor_dim`, `n_vectors`, `backbone_label`, `backbone_sha256_hex`, `built_at`, `hnsw_params`, `sidecar_sha256_hex`, `tile_id_to_int64_mapping`).
|
||||
6. Atomic-writes `<index_path>` + `<index_path>.sha256` via `Sha256Sidecar.write_atomic_and_sidecar(index_path, temp_index_bytes)`; then atomic-writes `<index_path>.meta.json` via `Sha256Sidecar.write_atomic` (no nested `.sha256` for the meta file).
|
||||
7. Removes the local temp file (best-effort; logged on failure).
|
||||
8. Reloads the in-memory index by re-running the load flow (so subsequent `search_topk` calls hit the fresh data).
|
||||
9. Emits an INFO log on success: `kind="c6.faiss.rebuilt"` with `n_vectors`, `descriptor_dim`, elapsed seconds.
|
||||
- `index_metadata() -> IndexMetadata`: returns the cached `IndexMetadata` populated at load time; raises `IndexUnavailableError` if the index is not currently loaded.
|
||||
- Load flow at construction:
|
||||
1. Validates `index_path` exists; if missing, raises `IndexUnavailableError` (composition root catches and decides — Tier-0 dev may proceed with descriptor-index-dependent paths disabled, similar to AZ-302's pattern).
|
||||
2. Reads `<index_path>.sha256` and validates it matches `sha256(<index_path>)`; mismatch → `IndexUnavailableError`.
|
||||
3. Reads `<index_path>.meta.json` and validates it parses to `IndexMetadata` AND that `meta.sidecar_sha256_hex == sha256(<index_path>)` (triple-consistency check); any mismatch → `IndexUnavailableError`.
|
||||
4. Calls `faiss.read_index(<index_path>, faiss.IO_FLAG_MMAP | faiss.IO_FLAG_READ_ONLY)`. Caches the FAISS handle.
|
||||
5. Caches `descriptor_dim`, `n_vectors`, the `_id_to_tile_id` map, and the parsed `IndexMetadata`.
|
||||
6. Sets `efSearch = meta.hnsw_params.ef_search` on the inner HNSW.
|
||||
7. If `warmup_query` is supplied, runs ONE `search_topk(warmup_query, k=1)` to page in the mmap'd file.
|
||||
- `BUILD_FAISS_INDEX=OFF` semantics live at the **composition-root factory** (`runtime_root.storage_factory.build_descriptor_index`, AZ-303) — the factory checks the env flag BEFORE attempting the import; with the flag OFF, `RuntimeNotAvailableError` is raised and `faiss_descriptor_index` is never loaded into `sys.modules`. The module itself is import-time-clean (no side effects) — gating is at the factory boundary, not inside the impl module. This matches the established AZ-303 contract and keeps Tier-0 dev environments without `faiss-cpu` blocked from accidentally constructing the impl.
|
||||
- All third-party FAISS exceptions (`RuntimeError` from the SWIG wrapper, plus `OSError` from the underlying file ops) are caught and rewrapped into `IndexUnavailableError` (read path) or `IndexBuildError` (rebuild path) with the original via `__cause__`.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- `FaissDescriptorIndex` class implementation conforming to AZ-303's Protocol.
|
||||
- The `<index_path>.meta.json` sidecar format — a JSON document carrying `IndexMetadata` plus the `tile_id` ↔ int64 mapping.
|
||||
- The HNSW int64-id assignment scheme: a stable, deterministic mapping from `TileId` (composite tuple) to int64 id at rebuild time. Mapping function: `int.from_bytes(sha256(f"{zoom_level}|{lat:.8f}|{lon:.8f}").digest()[:8], "big", signed=True)`. Collisions are detected at rebuild time (rebuild raises `IndexBuildError` naming both colliding tile_ids on collision).
|
||||
- Construction-time `faiss.read_index(path, IO_FLAG_MMAP | IO_FLAG_READ_ONLY)` of the existing `.index` file (or `IndexUnavailableError` if absent / corrupted / sidecar-mismatched).
|
||||
- Triple-consistency load check: `sha256(.index) == .sha256.content == .meta.json::sidecar_sha256_hex`.
|
||||
- Optional construction-time warm-up query (no warm-up if `warmup_query=None`).
|
||||
- Composition-root convenience entry-point `FaissDescriptorIndex.from_config(config)` mirroring AZ-305's `PostgresFilesystemStore.from_config`.
|
||||
- BUILD_FAISS_INDEX gate enforced at the AZ-303 factory (`build_descriptor_index`), not at module import time.
|
||||
- Diagnostic INFO log on construction with `n_vectors`, `descriptor_dim`, sidecar SHA-256, build timestamp; INFO on `rebuild_from_descriptors` start + end with elapsed seconds.
|
||||
|
||||
### Excluded
|
||||
|
||||
- The C10 CacheProvisioner orchestration that calls `rebuild_from_descriptors` — owned by E-C10. This task exposes the API; C10 calls it.
|
||||
- The C2 VPR consumer wiring of `search_topk` — owned by E-C2.
|
||||
- A second `DescriptorIndex` impl (e.g., `FlatDescriptorIndex` for unit tests that don't want HNSW overhead) — out of scope this cycle. Tests use a fake satisfying the Protocol where Protocol-only behaviour is tested; tests that exercise FAISS itself use the real `faiss-cpu` package.
|
||||
- A custom pybind11 wrapper at `cpp/faiss_index/` — superseded by the `faiss-cpu` PyPI dep choice (research fact #92 + arch tech-stack pin); the placeholder directory is removed in this task and the `BUILD_FAISS_INDEX` flag stays as the factory-level gate only.
|
||||
- A standalone operator inspect CLI — deferred (operators can use `faiss.read_index(...)` + the public `index_metadata()` accessor directly; a richer CLI is out of cycle).
|
||||
- GPU FAISS variants — explicitly forbidden by AZ-303 § I-4.
|
||||
- Incremental updates / online learning — F1 pre-flight is full-rebuild only per `descriptor_index.md` Non-Goals.
|
||||
- Descriptor compression / PQ quantisation — out of scope this cycle (HNSW32 raw float32).
|
||||
- Cross-flight `.index` sharing — parent-suite concern (D-PROJ-2).
|
||||
- Backbone retraining — owned by E-C7 / E-C10.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: search_topk returns ordered ids on a known corpus**
|
||||
Given a freshly-rebuilt index from 1000 known descriptors with deterministic int64 ids
|
||||
When `search_topk(query=descriptors[0], k=5)` is called
|
||||
Then the result is a list of 5 `(TileId, float)` pairs; the first pair's `TileId` matches `tile_ids[0]`; the first pair's distance is < 1e-6 (self-match); pairs are ordered by ascending distance
|
||||
|
||||
**AC-2: search_topk returns fewer-than-k when corpus is small**
|
||||
Given a 3-vector corpus and `k=10`
|
||||
When `search_topk(query, k=10)` is called
|
||||
Then the result has length 3; every pair's `TileId` matches one of the 3 corpus tile_ids; no exception
|
||||
|
||||
**AC-3: search_topk rejects shape / dtype / contiguity mismatch**
|
||||
Given a query with `shape=(descriptor_dim+1,)` (wrong dim), or `dtype=float64`, or `flags.c_contiguous=False`
|
||||
When `search_topk(query, k=5)` is called
|
||||
Then `IndexUnavailableError` is raised with a message naming the violation; no FAISS call is made (verifiable via the C++ wrapper's call counter staying flat)
|
||||
|
||||
**AC-4: rebuild_from_descriptors atomic on crash**
|
||||
Given an existing valid `.index` and `.meta.json` and `.sha256` sidecars
|
||||
When `rebuild_from_descriptors` is called and the test simulates `os._exit` AFTER the temp file is written but BEFORE the atomic rename
|
||||
Then on next construction the original `.index` and sidecars are intact and loadable; the temp file is left behind for cleanup at next start (cleanup is the construction-time scan's responsibility)
|
||||
|
||||
**AC-5: rebuild_from_descriptors writes correct sidecars**
|
||||
Given a successful rebuild
|
||||
When the test inspects the resulting files
|
||||
Then the `.index` file's sha256 matches the `.sha256` sidecar content; the `.meta.json` `descriptor_dim` matches `descriptors.shape[1]`; `n_vectors` matches `len(tile_ids)`; `built_at` is within 1 s of the call time; `hnsw_params` matches the input
|
||||
|
||||
**AC-6: Construction validates sidecar coherence**
|
||||
Given an `.index` whose `.sha256` sidecar content is mutated to a wrong value
|
||||
When `FaissDescriptorIndex(index_path=..., sha256_sidecar=..., ...)` is constructed
|
||||
Then `IndexUnavailableError` is raised with a message naming the path; the FAISS handle is not loaded (verifiable via `mmap_handle()` raising `IndexUnavailableError` on the partially-constructed object)
|
||||
|
||||
**AC-7: Construction validates meta.json**
|
||||
Given an `.index` whose `.meta.json` is missing or contains malformed JSON
|
||||
When the index is constructed
|
||||
Then `IndexUnavailableError` is raised; the FAISS handle is not loaded
|
||||
|
||||
**AC-8: Warm-up query pages the mmap on construction**
|
||||
Given a freshly-loaded index whose mmap'd file is NOT in the OS page cache and a `warmup_query` is supplied
|
||||
When the construction returns
|
||||
Then a subsequent `search_topk` p95 < 5 ms (warm); without the warm-up, the first `search_topk` would be ≥ 100 ms (cold). The test fakes the cold-state by `posix_fadvise(POSIX_FADV_DONTNEED)` on the mapped file before construction.
|
||||
|
||||
**AC-9: search_topk p95 latency budget**
|
||||
Given a 100k-vector corpus, page cache warm
|
||||
When `search_topk` is called 1000 times with random queries
|
||||
Then p95 ≤ 5 ms (failure threshold 50 ms — but this is a sanity bound, NOT the C2 budget; the canonical C2-PT-01 measurement is in C2's test phase)
|
||||
|
||||
**AC-10: BUILD_FAISS_INDEX=OFF blocks construction at the factory boundary**
|
||||
Given an environment where `BUILD_FAISS_INDEX` is unset OR `OFF`/`0`/`false`
|
||||
When `runtime_root.storage_factory.build_descriptor_index(config)` is called with `descriptor_index_runtime="faiss_hnsw"`
|
||||
Then `RuntimeNotAvailableError` is raised naming `"faiss_hnsw"`; `gps_denied_onboard.components.c6_tile_cache.faiss_descriptor_index` is NOT present in `sys.modules` (factory checks the env flag BEFORE the import). This AC is already covered by AZ-303's `test_ac5_build_descriptor_index_flag_off_raises_no_import` and is RE-VERIFIED here against the real impl module to confirm the impl module did not introduce eager imports that defeat the gate.
|
||||
|
||||
**AC-11: int64-id collision detection at rebuild**
|
||||
Given two `tile_ids` whose deterministic int64 mapping collides (synthetic test using a hash-seed mock)
|
||||
When `rebuild_from_descriptors` is called
|
||||
Then `IndexBuildError` is raised with a message naming both colliding tile_ids; no `.index` is written; the original index (if any) is untouched
|
||||
|
||||
**AC-12: index_metadata round-trip**
|
||||
Given a rebuild with known `(descriptor_dim, n_vectors, backbone_label, backbone_sha256_hex, hnsw_params)`
|
||||
When the post-rebuild `index_metadata()` is called
|
||||
Then the returned `IndexMetadata` matches every field; `sidecar_sha256_hex` matches `sha256(.index)` content
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- `search_topk` p95 ≤ 5 ms warm at 100k corpus (AC-9 / sanity bound; canonical budget is C2-PT-01).
|
||||
- Construction with warm-up ≤ 10 s for a 100k-vector index (mmap page-in dominates; warm-up is a single search).
|
||||
- `rebuild_from_descriptors` is bound by FAISS HNSW build time — minutes for 100k vectors. NOT a hot-path operation; F1 pre-flight only.
|
||||
|
||||
**Compatibility**
|
||||
- `faiss-cpu>=1.7,<2.0` PyPI dep, ARM64 + x86_64 wheels published; numpy 1.x compatible (matches project numpy pin per leftover D-CROSS-CVE-1).
|
||||
- numpy float32 C-contiguous arrays only on the search surface.
|
||||
|
||||
**Reliability**
|
||||
- All `faiss-cpu` `RuntimeError` exceptions are caught and rewrapped into `IndexUnavailableError` (read path) / `IndexBuildError` (rebuild path) with the original via `__cause__`.
|
||||
- The mmap'd file lifetime is bound to the `FaissDescriptorIndex` instance lifetime; the composition root holds the singleton for the flight.
|
||||
- `rebuild_from_descriptors` is atomic — partial failure preserves the prior index. Triple-consistency load gate fails closed on any `.index` / `.sha256` / `.meta.json` mismatch.
|
||||
- `.index` is never modified in place — always written to a temp path then atomically renamed via the AZ-280 sidecar helper.
|
||||
|
||||
**Concurrency**
|
||||
- `search_topk` is NOT re-entrant per AZ-303 § I-8. The F3 hot path is single-threaded (description.md). Multi-threaded callers MUST use a per-thread instance (out of scope this cycle; documented as a constraint).
|
||||
- `rebuild_from_descriptors` is offline; never runs concurrently with `search_topk` in the same process. F1 pre-flight is in C10's pre-flight binary; F3 is in the airborne binary.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | rebuild + search_topk on 1000 descriptors | First result self-matches at distance < 1e-6; ordered by distance |
|
||||
| AC-2 | search_topk with k > corpus size | Returns corpus-size results; no exception |
|
||||
| AC-3 | search_topk with wrong shape / dtype / non-contiguous | IndexUnavailableError; no FAISS call |
|
||||
| AC-4 | rebuild crash mid-rename (simulated via monkeypatched `os.replace`) | Original index intact on next load |
|
||||
| AC-5 | Inspect post-rebuild sidecars | `.sha256` matches; `.meta.json` matches input |
|
||||
| AC-6 | Sidecar content corrupted | IndexUnavailableError on construct |
|
||||
| AC-7 | `.meta.json` missing/malformed | IndexUnavailableError on construct |
|
||||
| AC-8 | Warm-up forces mmap page-in | Subsequent search p95 within sanity bound even when warm-up is forced cold via `posix_fadvise(POSIX_FADV_DONTNEED)` (test skipped on platforms without `posix_fadvise`) |
|
||||
| AC-9 | Microbench search × 1000 on 100k corpus (slow-marked) | p95 ≤ 5 ms |
|
||||
| AC-10 | Factory call with `BUILD_FAISS_INDEX` unset → `RuntimeNotAvailableError`; `faiss_descriptor_index` not in `sys.modules` | re-verifies AZ-303 gate against real impl |
|
||||
| AC-11 | Two tile_ids whose int64 mapping collides (forced via monkeypatched id-derivation) | IndexBuildError; no `.index` written |
|
||||
| AC-12 | Round-trip IndexMetadata after rebuild | Every field matches input |
|
||||
| NFR-perf-rebuild | 100k vectors, time the rebuild (slow-marked) | Wall ≤ 5 minutes (sanity bound; F1 pre-flight runs offline) |
|
||||
| NFR-reliability-rewrap | Inject a `RuntimeError` from FAISS via monkeypatch | Rewrapped into IndexUnavailableError / IndexBuildError; original message in `__cause__` |
|
||||
|
||||
## Constraints
|
||||
|
||||
- `faiss-cpu>=1.7,<2.0` — promoted from `[indexing]` extras to main `dependencies` in this task. No version-negotiation logic.
|
||||
- numpy float32 C-contiguous on all array surfaces; no auto-casting.
|
||||
- HNSW only this cycle — no `IndexFlat`, no `IndexIVF*`, no GPU variants.
|
||||
- `.index` files are NEVER modified in place — always temp + atomic-rename via the AZ-280 `Sha256Sidecar` helper.
|
||||
- The int64-id deterministic mapping `int.from_bytes(sha256(f"{zoom_level}|{lat:.8f}|{lon:.8f}").digest()[:8], "big", signed=True)` is a project convention; if a future task changes it, every prior `.index` is invalidated and the operator must rebuild. (Note: `source` is intentionally NOT part of the hash input — a tile is identified by spatial position, not by which feed produced it.)
|
||||
- The `<index_path>.meta.json` sidecar is the source of truth for `tile_id` ↔ int64 mapping; the `.index` file alone is insufficient (FAISS HNSW stores int64 ids only).
|
||||
- Composition-root gating: `BUILD_FAISS_INDEX=OFF` → factory raises `RuntimeNotAvailableError` BEFORE attempting the import. The impl module itself is import-clean.
|
||||
- This task adds one new third-party dependency: `faiss-cpu>=1.7,<2.0` (promoted from existing `[indexing]` extras).
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: faiss-cpu API breaks across major-version pins**
|
||||
- *Risk*: A future bump to `faiss-cpu>=2.0` could rename or remove the `IndexHNSWFlat` / `IndexIDMap2` / `IO_FLAG_MMAP` surfaces.
|
||||
- *Mitigation*: pyproject.toml pins `faiss-cpu>=1.7,<2.0`; this task's impl is the only place inside the project that depends on the FAISS surface. Pin bumps are a separate task with its own AC.
|
||||
|
||||
**Risk 2: Mmap'd file is replaced concurrently**
|
||||
- *Risk*: An out-of-band process renames the `.index` file mid-flight; the mmap reads now hit corrupted bytes.
|
||||
- *Mitigation*: AZ-303 § I-1 forbids mid-flight modification. The composition root holds the singleton for the flight; out-of-band renames are operator-error. A future defensive task could add a periodic sidecar re-check; out of scope this cycle.
|
||||
|
||||
**Risk 3: Int64-id collision (cryptographic-hash) under adversarial inputs**
|
||||
- *Risk*: With ~10k tiles per provisioning, the birthday-paradox collision probability for an 8-byte truncation of SHA-256 is ~10^-12; effectively zero, but adversarial inputs could engineer a collision.
|
||||
- *Mitigation*: AC-11 detects collisions at rebuild time and aborts (raises `IndexBuildError`). Operator surfaces the error and either tweaks the corpus or bumps to a 16-byte id mapping — both are out-of-cycle, but the detection point is hard.
|
||||
|
||||
**Risk 4: HNSW first-query cold latency exceeds AC-NEW-1 budget**
|
||||
- *Risk*: The 100k-vector index's mmap takes seconds to page in; without warm-up, the first F3 search blocks for ≥ 1 s.
|
||||
- *Mitigation*: AC-8 forces a warm-up at construction; the operator's pre-flight `faiss_warmup_query_path` config field ensures it's not None in production. C10's pre-flight orchestrator is responsible for ensuring the warm-up query is supplied.
|
||||
|
||||
**Risk 5: faiss-cpu wheel availability on Jetson ARM64**
|
||||
- *Risk*: A future Jetson minor / Python minor bump could outpace published `faiss-cpu` wheels.
|
||||
- *Mitigation*: research fact #92 confirmed ARM64 wheels are published on PyPI for the supported Python range; if a future bump breaks availability, the gate fails closed at `pip install` time rather than silently degrading at runtime.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: FAISS HNSW retrieval + atomic `.index` rebuild + triple-sidecar coherence + mmap-backed read + warm-up + third-party-exception rewrap (description.md / E-C6 / NFT-LIM-01 / D-C10-3 / AC-NEW-1).
|
||||
- **Production code that must exist**: real `FaissDescriptorIndex` Python class implementing AZ-303's Protocol; real HNSW build via `faiss.IndexHNSWFlat` + `IndexIDMap2.add_with_ids`; real mmap'd read via `faiss.read_index(IO_FLAG_MMAP | IO_FLAG_READ_ONLY)`; real atomic rename via the AZ-280 sidecar helper; real `.meta.json` sidecar; real warm-up query at construction; real `RuntimeError` rewrap.
|
||||
- **Allowed external stubs**: tests MAY monkey-patch `faiss.read_index` / `faiss.write_index` to inject a `RuntimeError` for the rewrap-test; tests MAY supply synthetic descriptors and tile_ids; tests MAY simulate `os.replace` failure mid-rebuild for the atomicity test. Production wiring uses the real AZ-280 helper and the real `faiss-cpu` package.
|
||||
- **Unacceptable substitutes**: a SciPy / scikit-learn `NearestNeighbors` shim "for testing" (different algorithm, different latency profile, different file format — would invalidate the rebuild contract); skipping the warm-up query "to keep construction fast" (would break AC-NEW-1 cold-start budget); an in-memory id map without the `.meta.json` sidecar (would lose the tile_id ↔ int64 mapping across process restarts); a non-rewrapping handler that lets FAISS `RuntimeError` escape (would break the family invariant from AZ-303); switching to `IndexFlatL2` "because it's simpler" (would defeat AC-9 latency / D-C6-2 decision).
|
||||
|
||||
## Contract
|
||||
|
||||
This task implements the contract at `_docs/02_document/contracts/c6_tile_cache/descriptor_index.md`.
|
||||
Consumers MUST read that file — not this task spec — to discover the interface.
|
||||
Reference in New Issue
Block a user