mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 10:21:13 +00:00
[AZ-322] C10 DescriptorBatcher (faiss-cpu, OOM halve-retry)
Implements the C10 internal phase that walks every C6 tile, embeds through C2's backbone via the AZ-321-produced engine, and rebuilds the AZ-306 FAISS HNSW index in one atomic write. - DescriptorBatcher with halve-and-retry OOM recovery (default 1 retry) - BackboneEmbedder Protocol + C7EngineBackboneEmbedder default impl - DescriptorBatchError for OOM / dim-mismatch / missing-output failures - Empty-corpus surfaces as outcome=failure with explicit hint to run C11 - Per-10% progress callback + DEBUG logs (no engine bytes leaked) - Consumer-side Protocol cuts (TilesByBboxBatchQuery, TilePixelOpener, DescriptorIndexRebuilder) so c10 stays within AZ-270 lint - runtime_root.c10_factory adds build_descriptor_batcher + three C6->C10 adapters - 16 unit tests covering AC-1..AC-10 + 2 NFRs + 4 supplemental (Protocol conformance, query pass-through, handle release, config) Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,141 @@
|
||||
# Batch 36 — Cycle 1 Report
|
||||
|
||||
**Date**: 2026-05-13
|
||||
**Batch**: 36 (single task — direct AZ-306 follow-up)
|
||||
**Tasks**: AZ-322 (C10 Descriptor Batcher, 3pt)
|
||||
**Status**: complete; AZ-322 transitioned to "In Testing" pending operator review.
|
||||
|
||||
## Scope
|
||||
|
||||
AZ-322 implements `DescriptorBatcher` — the C10 phase that walks every C6 tile in the requested
|
||||
`(bbox, zoom_levels, sector_class)`, embeds it through C2's VPR backbone (via the C7 engine produced
|
||||
by AZ-321), and rebuilds the AZ-306 FAISS HNSW index in one atomic write.
|
||||
|
||||
This unblocks the airborne C2 VPR step's takeoff verify (AC-NEW-1) and makes the C10-PT-01
|
||||
cold-build budget observable end-to-end.
|
||||
|
||||
## Architectural Decisions
|
||||
|
||||
### 1. Consumer-side Protocol cuts (AZ-270 / AZ-507 compliance)
|
||||
|
||||
The AZ-322 task spec listed direct C6 types (`TileMetadataStore`, `TileStore`, `DescriptorIndex`)
|
||||
in the `DescriptorBatcher.__init__` signature. That contradicts AZ-270 (no cross-component
|
||||
imports inside `components/*`) and the AZ-507 cross-component contract surface rule. The
|
||||
established precedent — AZ-323's `ManifestBuilder` and AZ-324's `ManifestVerifierImpl` — declares
|
||||
**consumer-side structural Protocol cuts** locally inside the C10 module and lets the composition
|
||||
root (`runtime_root.c10_factory`) wire C6's concrete strategies in via thin adapters.
|
||||
|
||||
This batch follows that precedent. `descriptor_batcher.py` declares four
|
||||
local-to-C10 Protocols:
|
||||
|
||||
- `BackboneEmbedder` (lifted to `interface.py` for re-use by future tasks)
|
||||
- `TilesByBboxBatchQuery` — narrower than C6's `TileMetadataStore.query_by_bbox`, accepts
|
||||
`tuple[int, ...]` of zooms instead of a single zoom
|
||||
- `TilePixelOpener` — narrower than C6's `TileStore.read_tile_pixels(TileId)`; takes
|
||||
`(zoom, lat, lon)` and returns a context manager
|
||||
- `DescriptorIndexRebuilder` — narrower than C6's
|
||||
`DescriptorIndex.rebuild_from_descriptors(descriptors, tile_ids: list[TileId], hnsw_params: HnswParams)`;
|
||||
takes `tile_records: list[TileBboxRecord]` plus individual HNSW kwargs
|
||||
|
||||
The matching adapters live in `runtime_root/c10_factory.py`:
|
||||
|
||||
- `c6_tile_metadata_store_to_tiles_batch_query` — loops over `zoom_levels`, projects `TileMetadata`
|
||||
rows down to the four-field `TileBboxRecord`
|
||||
- `c6_tile_store_to_pixel_opener` — builds `TileId` and returns the C6 `TilePixelHandle` (already
|
||||
a context manager)
|
||||
- `c6_descriptor_index_to_rebuilder` — projects `TileBboxRecord` → `TileId` and folds HNSW kwargs
|
||||
into `HnswParams`
|
||||
|
||||
### 2. `C7EngineBackboneEmbedder` adapter — `Any`-typed at the c7 boundary
|
||||
|
||||
The default `BackboneEmbedder` impl wraps an AZ-297 `InferenceRuntime` + an AZ-321-compiled
|
||||
`EngineHandle`. Importing those types — even under `TYPE_CHECKING` — fails the AZ-270 AST lint
|
||||
because the lint walks `ast.ImportFrom` nodes regardless of context. We therefore type the
|
||||
constructor parameters as `Any` and rely on structural duck-typing
|
||||
(`inference_runtime.infer(handle, dict) -> dict`). The composition root wires the concrete C7
|
||||
runtime in.
|
||||
|
||||
### 3. JPEG → tensor preprocessing is injected, not owned
|
||||
|
||||
`C7EngineBackboneEmbedder` accepts a `tile_decoder: Callable[[Any], np.ndarray]` rather than
|
||||
hard-wiring OpenCV / Pillow / torchvision. Image preprocessing belongs to E-C2 (AZ-255); when
|
||||
it ships, the composition root injects a real decoder. Until then the adapter stays free of
|
||||
imaging-stack dependencies, keeping AZ-322's surface narrow and the test surface tiny.
|
||||
|
||||
### 4. Descriptor int64 id formula — reuse AZ-306, do not invent
|
||||
|
||||
`DescriptorBatcher` does NOT recompute the int64 id formula. It hands `TileBboxRecord` rows to
|
||||
the rebuilder; the rebuilder adapter projects to `TileId`; AZ-306's
|
||||
`FaissDescriptorIndex.rebuild_from_descriptors` uses the canonical
|
||||
`tile_id_to_int64(TileId)` helper. Test `test_ac6_descriptor_id_mapping_matches_az306_scheme`
|
||||
confirms by importing `tile_id_to_int64` directly and asserting against the
|
||||
`int.from_bytes(sha256("zoom|lat|lon").first8, "big", signed=True)` formula.
|
||||
|
||||
## Files Changed
|
||||
|
||||
### Production code (new)
|
||||
|
||||
- `src/gps_denied_onboard/components/c10_provisioning/descriptor_batcher.py` — `DescriptorBatcher`
|
||||
class + `BatcherTile`, `TileBboxRecord`, `CorpusFilter`, `ProgressEvent`, `DescriptorBatchReport`,
|
||||
`BatcherOutcome`, `C10BatcherConfig` DTOs + `TilesByBboxBatchQuery`, `TilePixelOpener`,
|
||||
`DescriptorIndexRebuilder` consumer Protocols.
|
||||
- `src/gps_denied_onboard/components/c10_provisioning/c7_engine_embedder.py` —
|
||||
`C7EngineBackboneEmbedder` adapter wrapping the AZ-297 `InferenceRuntime` surface; `Any`-typed
|
||||
to stay below the AZ-270 boundary.
|
||||
|
||||
### Production code (modified)
|
||||
|
||||
- `src/gps_denied_onboard/components/c10_provisioning/interface.py` — added `BackboneEmbedder`
|
||||
Protocol (`embed_batch` + `descriptor_dim`), `runtime_checkable`.
|
||||
- `src/gps_denied_onboard/components/c10_provisioning/errors.py` — added `DescriptorBatchError`
|
||||
exception class extending `C10ProvisioningError`.
|
||||
- `src/gps_denied_onboard/components/c10_provisioning/__init__.py` — re-exported all new symbols.
|
||||
- `src/gps_denied_onboard/runtime_root/c10_factory.py` — added `build_descriptor_batcher` plus
|
||||
the three C6→C10 adapter functions.
|
||||
|
||||
### Tests (new)
|
||||
|
||||
- `tests/unit/c10_provisioning/test_descriptor_batcher.py` — 16 tests covering AC-1 through
|
||||
AC-10 + NFR-perf-overhead + NFR-reliability-bounded-retry, plus 4 supplemental tests
|
||||
(`Protocol` runtime-check for the four consumer cuts, query-args pass-through, handle
|
||||
release on embed failure, config validation).
|
||||
|
||||
### Documentation
|
||||
|
||||
- `_docs/02_document/module-layout.md` — c10 Public API + Internal section updated to list the
|
||||
AZ-322 surface; composition root section lists the new factory + adapters.
|
||||
- `_docs/02_document/components/11_c10_provisioning/description.md` — §5 dependency table picks
|
||||
up `numpy`; new "AZ-322 internal phase" subsection summarises the batcher's
|
||||
contract / OOM behaviour / progress reporting / id formula.
|
||||
|
||||
## Test Results
|
||||
|
||||
- 16 / 16 AZ-322 tests pass (`tests/unit/c10_provisioning/test_descriptor_batcher.py`).
|
||||
- 197 / 197 c10 + c6 + runtime-root targeted runs pass (59 docker-skip).
|
||||
- Full project suite: **1352 passed, 79 skipped, 1 failed**.
|
||||
- 79 skipped: docker / Jetson / CUDA / actionlint env-gated (Tier-0 dev host).
|
||||
- 1 failed: `tests/unit/test_ac1_scaffold_layout.py::test_cmake_files_configure` —
|
||||
pre-existing OKVIS2 git-submodule failure documented in batch_35 cycle report; unrelated
|
||||
to this batch.
|
||||
|
||||
## Decisions Ledger
|
||||
|
||||
| Decision | Rationale |
|
||||
|----------|-----------|
|
||||
| `DescriptorBatcher.__init__` takes consumer-side Protocols, not raw C6 types | AZ-270 lint blocks direct cross-component imports; AZ-323 / AZ-324 set the precedent |
|
||||
| `C7EngineBackboneEmbedder` parameters are `Any`-typed | AZ-270 AST lint flags `TYPE_CHECKING` imports too; structural duck-typing avoids the boundary |
|
||||
| `tile_decoder` is injected, not bundled | JPEG preprocessing belongs to E-C2 (AZ-255); keeping it out of AZ-322 narrows scope and dependencies |
|
||||
| Default `C10BatcherConfig.max_oom_retries=1` | Spec NFR-reliability-bounded-retry; one halve from 64 → 32 is the standard surface, deeper retries mask GPU regressions |
|
||||
| Reuse AZ-306's `tile_id_to_int64` | Spec AC-6; inventing the formula here would diverge from C6's id scheme |
|
||||
| Atomic FAISS rebuild guaranteed by AZ-306, not duplicated here | Spec AC-7; the batcher's role is to call `rebuild_from_descriptors` exactly once |
|
||||
|
||||
## Notes
|
||||
|
||||
- The `C7EngineBackboneEmbedder` is the default `BackboneEmbedder` impl, but production wiring
|
||||
to a real C7 engine awaits AZ-326 (T5 orchestrator) and AZ-255 (real C2 backbone preprocessing).
|
||||
The adapter is unit-tested via fakes today; integration tests land with AZ-326.
|
||||
- `C10BatcherConfig` currently has no dedicated config-block hook in
|
||||
`C10ProvisioningConfig`; `build_descriptor_batcher` uses defaults. AZ-326 will add the
|
||||
config-block plumbing.
|
||||
- The OKVIS2 cmake submodule failure remains and is independent of every batch-35 / batch-36
|
||||
change. It will resolve when the project's submodules are initialised on the dev host.
|
||||
Reference in New Issue
Block a user