[AZ-322] C10 DescriptorBatcher (faiss-cpu, OOM halve-retry)

Implements the C10 internal phase that walks every C6 tile, embeds
through C2's backbone via the AZ-321-produced engine, and rebuilds
the AZ-306 FAISS HNSW index in one atomic write.

- DescriptorBatcher with halve-and-retry OOM recovery (default 1 retry)
- BackboneEmbedder Protocol + C7EngineBackboneEmbedder default impl
- DescriptorBatchError for OOM / dim-mismatch / missing-output failures
- Empty-corpus surfaces as outcome=failure with explicit hint to run C11
- Per-10% progress callback + DEBUG logs (no engine bytes leaked)
- Consumer-side Protocol cuts (TilesByBboxBatchQuery, TilePixelOpener,
  DescriptorIndexRebuilder) so c10 stays within AZ-270 lint
- runtime_root.c10_factory adds build_descriptor_batcher + three
  C6->C10 adapters
- 16 unit tests covering AC-1..AC-10 + 2 NFRs + 4 supplemental
  (Protocol conformance, query pass-through, handle release, config)

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-13 04:20:47 +03:00
parent 3b7265757b
commit f01a5058ab
12 changed files with 1733 additions and 10 deletions
@@ -0,0 +1,141 @@
# Batch 36 — Cycle 1 Report
**Date**: 2026-05-13
**Batch**: 36 (single task — direct AZ-306 follow-up)
**Tasks**: AZ-322 (C10 Descriptor Batcher, 3pt)
**Status**: complete; AZ-322 transitioned to "In Testing" pending operator review.
## Scope
AZ-322 implements `DescriptorBatcher` — the C10 phase that walks every C6 tile in the requested
`(bbox, zoom_levels, sector_class)`, embeds it through C2's VPR backbone (via the C7 engine produced
by AZ-321), and rebuilds the AZ-306 FAISS HNSW index in one atomic write.
This unblocks the airborne C2 VPR step's takeoff verify (AC-NEW-1) and makes the C10-PT-01
cold-build budget observable end-to-end.
## Architectural Decisions
### 1. Consumer-side Protocol cuts (AZ-270 / AZ-507 compliance)
The AZ-322 task spec listed direct C6 types (`TileMetadataStore`, `TileStore`, `DescriptorIndex`)
in the `DescriptorBatcher.__init__` signature. That contradicts AZ-270 (no cross-component
imports inside `components/*`) and the AZ-507 cross-component contract surface rule. The
established precedent — AZ-323's `ManifestBuilder` and AZ-324's `ManifestVerifierImpl` — declares
**consumer-side structural Protocol cuts** locally inside the C10 module and lets the composition
root (`runtime_root.c10_factory`) wire C6's concrete strategies in via thin adapters.
This batch follows that precedent. `descriptor_batcher.py` declares four
local-to-C10 Protocols:
- `BackboneEmbedder` (lifted to `interface.py` for re-use by future tasks)
- `TilesByBboxBatchQuery` — narrower than C6's `TileMetadataStore.query_by_bbox`, accepts
`tuple[int, ...]` of zooms instead of a single zoom
- `TilePixelOpener` — narrower than C6's `TileStore.read_tile_pixels(TileId)`; takes
`(zoom, lat, lon)` and returns a context manager
- `DescriptorIndexRebuilder` — narrower than C6's
`DescriptorIndex.rebuild_from_descriptors(descriptors, tile_ids: list[TileId], hnsw_params: HnswParams)`;
takes `tile_records: list[TileBboxRecord]` plus individual HNSW kwargs
The matching adapters live in `runtime_root/c10_factory.py`:
- `c6_tile_metadata_store_to_tiles_batch_query` — loops over `zoom_levels`, projects `TileMetadata`
rows down to the four-field `TileBboxRecord`
- `c6_tile_store_to_pixel_opener` — builds `TileId` and returns the C6 `TilePixelHandle` (already
a context manager)
- `c6_descriptor_index_to_rebuilder` — projects `TileBboxRecord``TileId` and folds HNSW kwargs
into `HnswParams`
### 2. `C7EngineBackboneEmbedder` adapter — `Any`-typed at the c7 boundary
The default `BackboneEmbedder` impl wraps an AZ-297 `InferenceRuntime` + an AZ-321-compiled
`EngineHandle`. Importing those types — even under `TYPE_CHECKING` — fails the AZ-270 AST lint
because the lint walks `ast.ImportFrom` nodes regardless of context. We therefore type the
constructor parameters as `Any` and rely on structural duck-typing
(`inference_runtime.infer(handle, dict) -> dict`). The composition root wires the concrete C7
runtime in.
### 3. JPEG → tensor preprocessing is injected, not owned
`C7EngineBackboneEmbedder` accepts a `tile_decoder: Callable[[Any], np.ndarray]` rather than
hard-wiring OpenCV / Pillow / torchvision. Image preprocessing belongs to E-C2 (AZ-255); when
it ships, the composition root injects a real decoder. Until then the adapter stays free of
imaging-stack dependencies, keeping AZ-322's surface narrow and the test surface tiny.
### 4. Descriptor int64 id formula — reuse AZ-306, do not invent
`DescriptorBatcher` does NOT recompute the int64 id formula. It hands `TileBboxRecord` rows to
the rebuilder; the rebuilder adapter projects to `TileId`; AZ-306's
`FaissDescriptorIndex.rebuild_from_descriptors` uses the canonical
`tile_id_to_int64(TileId)` helper. Test `test_ac6_descriptor_id_mapping_matches_az306_scheme`
confirms by importing `tile_id_to_int64` directly and asserting against the
`int.from_bytes(sha256("zoom|lat|lon").first8, "big", signed=True)` formula.
## Files Changed
### Production code (new)
- `src/gps_denied_onboard/components/c10_provisioning/descriptor_batcher.py``DescriptorBatcher`
class + `BatcherTile`, `TileBboxRecord`, `CorpusFilter`, `ProgressEvent`, `DescriptorBatchReport`,
`BatcherOutcome`, `C10BatcherConfig` DTOs + `TilesByBboxBatchQuery`, `TilePixelOpener`,
`DescriptorIndexRebuilder` consumer Protocols.
- `src/gps_denied_onboard/components/c10_provisioning/c7_engine_embedder.py`
`C7EngineBackboneEmbedder` adapter wrapping the AZ-297 `InferenceRuntime` surface; `Any`-typed
to stay below the AZ-270 boundary.
### Production code (modified)
- `src/gps_denied_onboard/components/c10_provisioning/interface.py` — added `BackboneEmbedder`
Protocol (`embed_batch` + `descriptor_dim`), `runtime_checkable`.
- `src/gps_denied_onboard/components/c10_provisioning/errors.py` — added `DescriptorBatchError`
exception class extending `C10ProvisioningError`.
- `src/gps_denied_onboard/components/c10_provisioning/__init__.py` — re-exported all new symbols.
- `src/gps_denied_onboard/runtime_root/c10_factory.py` — added `build_descriptor_batcher` plus
the three C6→C10 adapter functions.
### Tests (new)
- `tests/unit/c10_provisioning/test_descriptor_batcher.py` — 16 tests covering AC-1 through
AC-10 + NFR-perf-overhead + NFR-reliability-bounded-retry, plus 4 supplemental tests
(`Protocol` runtime-check for the four consumer cuts, query-args pass-through, handle
release on embed failure, config validation).
### Documentation
- `_docs/02_document/module-layout.md` — c10 Public API + Internal section updated to list the
AZ-322 surface; composition root section lists the new factory + adapters.
- `_docs/02_document/components/11_c10_provisioning/description.md` — §5 dependency table picks
up `numpy`; new "AZ-322 internal phase" subsection summarises the batcher's
contract / OOM behaviour / progress reporting / id formula.
## Test Results
- 16 / 16 AZ-322 tests pass (`tests/unit/c10_provisioning/test_descriptor_batcher.py`).
- 197 / 197 c10 + c6 + runtime-root targeted runs pass (59 docker-skip).
- Full project suite: **1352 passed, 79 skipped, 1 failed**.
- 79 skipped: docker / Jetson / CUDA / actionlint env-gated (Tier-0 dev host).
- 1 failed: `tests/unit/test_ac1_scaffold_layout.py::test_cmake_files_configure`
pre-existing OKVIS2 git-submodule failure documented in batch_35 cycle report; unrelated
to this batch.
## Decisions Ledger
| Decision | Rationale |
|----------|-----------|
| `DescriptorBatcher.__init__` takes consumer-side Protocols, not raw C6 types | AZ-270 lint blocks direct cross-component imports; AZ-323 / AZ-324 set the precedent |
| `C7EngineBackboneEmbedder` parameters are `Any`-typed | AZ-270 AST lint flags `TYPE_CHECKING` imports too; structural duck-typing avoids the boundary |
| `tile_decoder` is injected, not bundled | JPEG preprocessing belongs to E-C2 (AZ-255); keeping it out of AZ-322 narrows scope and dependencies |
| Default `C10BatcherConfig.max_oom_retries=1` | Spec NFR-reliability-bounded-retry; one halve from 64 → 32 is the standard surface, deeper retries mask GPU regressions |
| Reuse AZ-306's `tile_id_to_int64` | Spec AC-6; inventing the formula here would diverge from C6's id scheme |
| Atomic FAISS rebuild guaranteed by AZ-306, not duplicated here | Spec AC-7; the batcher's role is to call `rebuild_from_descriptors` exactly once |
## Notes
- The `C7EngineBackboneEmbedder` is the default `BackboneEmbedder` impl, but production wiring
to a real C7 engine awaits AZ-326 (T5 orchestrator) and AZ-255 (real C2 backbone preprocessing).
The adapter is unit-tested via fakes today; integration tests land with AZ-326.
- `C10BatcherConfig` currently has no dedicated config-block hook in
`C10ProvisioningConfig`; `build_descriptor_batcher` uses defaults. AZ-326 will add the
config-block plumbing.
- The OKVIS2 cmake submodule failure remains and is independent of every batch-35 / batch-36
change. It will resolve when the project's submodules are initialised on the dev host.