[AZ-322] C10 DescriptorBatcher (faiss-cpu, OOM halve-retry)

Implements the C10 internal phase that walks every C6 tile, embeds
through C2's backbone via the AZ-321-produced engine, and rebuilds
the AZ-306 FAISS HNSW index in one atomic write.

- DescriptorBatcher with halve-and-retry OOM recovery (default 1 retry)
- BackboneEmbedder Protocol + C7EngineBackboneEmbedder default impl
- DescriptorBatchError for OOM / dim-mismatch / missing-output failures
- Empty-corpus surfaces as outcome=failure with explicit hint to run C11
- Per-10% progress callback + DEBUG logs (no engine bytes leaked)
- Consumer-side Protocol cuts (TilesByBboxBatchQuery, TilePixelOpener,
  DescriptorIndexRebuilder) so c10 stays within AZ-270 lint
- runtime_root.c10_factory adds build_descriptor_batcher + three
  C6->C10 adapters
- 16 unit tests covering AC-1..AC-10 + 2 NFRs + 4 supplemental
  (Protocol conformance, query pass-through, handle release, config)

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-13 04:20:47 +03:00
parent 3b7265757b
commit f01a5058ab
12 changed files with 1733 additions and 10 deletions
@@ -12,6 +12,7 @@ from __future__ import annotations
__all__ = [
"C10ProvisioningError",
"DescriptorBatchError",
"ManifestWriteError",
]
@@ -20,6 +21,26 @@ class C10ProvisioningError(Exception):
"""Base class for the C10 cache-provisioning error family."""
class DescriptorBatchError(C10ProvisioningError):
"""``DescriptorBatcher.populate_descriptors`` could not finish (AZ-322).
Surfaces three failure modes:
1. ``"CUDA OOM"`` raised by the injected
:class:`gps_denied_onboard.components.c10_provisioning.interface.BackboneEmbedder`;
the batcher catches the OOM-flavoured instance and triggers the
halve-and-retry loop (AC-2). Persistent OOM after retries are
exhausted re-raises with the final batch size + tile-id
context (AC-3).
2. ``descriptor_dim`` mismatch — the impl returned a column count
that does not equal :meth:`BackboneEmbedder.descriptor_dim`
(AC-9); raised BEFORE the FAISS rebuild call so an existing
valid index is not corrupted.
3. Underlying FAISS rebuild failure (rewrapped from the AZ-306
:class:`IndexBuildError` envelope).
"""
class ManifestWriteError(C10ProvisioningError):
"""``ManifestBuilder.build_manifest`` could not produce a signed Manifest.