mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 17:01:14 +00:00
[AZ-336] C2 VprStrategy: Protocol + DTOs + factory + composition
Foundational scaffolding for every concrete C2 backbone (UltraVPR, NetVLAD, MegaLoc, MixVPR, SelaVPR, EigenPlaces, SALAD — AZ-337..AZ-340) and the C2.5 ReRanker consumer side. No backbone is implemented here. * VprStrategy Protocol (embed_query / retrieve_topk / descriptor_dim) + BackbonePreprocessor C2-internal Protocol (NOT in Public API per description.md § 6). * DTOs in L1 _types/vpr.py — VprQuery, VprCandidate, VprResult; all frozen + slots; tuple-not-list for VprResult.candidates so the immutability invariant truly holds. * Error family: VprError + VprBackboneError + VprPreprocessError + IndexUnavailableError; same-named but namespace-distinct from c6_tile_cache.IndexUnavailableError (the c2 family is the closed envelope C5 / C2.5 consume; concrete strategies rewrap the C6 form). * C2VprConfig (strategy enum + backbone_weights_path + faiss_index_path) with strict validation at load; registered into Config.components on c2_vpr import. * build_vpr_strategy factory with 7-strategy resolution table, lazy import, BUILD_VPR_<variant> gating, ImportError→ StrategyNotAvailableError mapping, and pre-flight descriptor_dim match against DescriptorIndex.descriptor_dim() — mismatch fires ConfigError at startup, NOT at first frame. Contract change vs the v1.0.0 draft: factory takes descriptor_index: DescriptorIndex (not tile_store: TileStore) because descriptor_dim() lives on DescriptorIndex per C6's Public API. The contract markdown is updated to match. Architecture: VprCandidate.tile_id is a plain (zoom, lat, lon) tuple, keeping _types/ (L1) free of any c6_tile_cache (L3) import per module-layout.md. Consumers reconstruct TileId at the C6 boundary. Excluded per task spec: * Concrete backbones (AZ-337..AZ-340). * FAISS HNSW retrieve wiring (AZ-341). * DescriptorNormaliser helper (AZ-283, already shipped). * AC-9 single-thread binding — deferred per task spec Risk 4 until the generic compose_root thread-binding registry is in place (today each factory owns its own, e.g. fc_factory). Tests: 45 ACs + NFRs in tests/unit/c2_vpr/test_protocol_conformance.py covering AC-1..AC-8, the error family, the config validation, the factory NFR (p99 ≤ 50 ms). The legacy smoke test is removed. Full sweep 973 passed, 2 skipped (CI-only cmake / actionlint). Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -1,183 +0,0 @@
|
||||
# C2 VPR Strategy Protocol + Factory + Composition
|
||||
|
||||
**Task**: AZ-336_c2_vpr_strategy_protocol
|
||||
**Name**: C2 `VprStrategy` Protocol + Factory + Composition
|
||||
**Description**: Define the public `VprStrategy` Protocol (PEP 544 structural interface), the `BackbonePreprocessor` C2-internal helper Protocol, the C2 DTOs (`VprQuery`, `VprCandidate`, `VprResult`), the error hierarchy (`VprError` family with `VprBackboneError`, `VprPreprocessError`, `IndexUnavailableError`), and the composition-root factory `build_vpr_strategy(config, tile_store, inference_runtime) -> VprStrategy` that selects the concrete backbone at startup based on `config.vpr.strategy` with lazy import + `BUILD_VPR_<variant>` flag gating per ADR-002. Includes a pre-flight `descriptor_dim()` ↔ C6 corpus sidecar `descriptor_dim` validation that fires at startup (NOT at first frame). This task delivers the foundational scaffolding every other C2 task depends on; no concrete backbone is implemented here.
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-263_initial_structure, AZ-269_config_loader, AZ-270_compose_root, AZ-303_c6_storage_interfaces (for `TileStore` + descriptor_dim sidecar), AZ-297_c7_runtime_protocol (for `InferenceRuntime` interface), AZ-266_log_module
|
||||
**Component**: c2_vpr (epic AZ-255 / E-C2)
|
||||
**Tracker**: AZ-336
|
||||
**Epic**: AZ-255 (E-C2)
|
||||
|
||||
### Document Dependencies
|
||||
|
||||
- `_docs/02_document/contracts/c2_vpr/vpr_strategy_protocol.md` — the public contract this task implements (Protocol surface + DTOs + error hierarchy + factory signature + invariants + test cases).
|
||||
- `_docs/02_document/components/02_c2_vpr/description.md` — § 2 `VprStrategy` interface table + DTOs + § 5 error handling + § 6 helpers + § 9 logging.
|
||||
- `_docs/02_document/module-layout.md` — § Per-Component Mapping `c2_vpr` (Public API + Internal + Owns + Imports from); § Build-Time Exclusion Map `BUILD_VPR_<variant>` row; § Layering — Layer 3.
|
||||
- `_docs/02_document/architecture.md` — ADR-001 (Strategy + composition root), ADR-002 (build-time exclusion via CMake `BUILD_*` flags), ADR-009 (interface-first DI, composition root the only place that imports concrete strategies).
|
||||
- `_docs/02_document/contracts/c6_tile_cache/descriptor_index.md` — descriptor index sidecar format (used by the pre-flight dim-match validation).
|
||||
- `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md` — `InferenceRuntime` interface consumed by every concrete backbone (referenced in the factory signature; not implemented in this task).
|
||||
|
||||
## Problem
|
||||
|
||||
Without this task, every concrete backbone (AZ-337..AZ-340) and the FAISS wiring (AZ-341) and the downstream consumer C2.5 ReRanker (AZ-256) would each invent their own ad-hoc interface, breaking three architectural invariants:
|
||||
|
||||
- **ADR-001 (Strategy)**: backbones must be swappable at composition time; without a shared Protocol, swapping requires rewriting every consumer.
|
||||
- **ADR-002 (build-time exclusion)**: each backbone is gated by `BUILD_VPR_<variant>`; without the lazy-import factory, any single TRT engine compile failure cascades into a hard import error at runtime, defeating per-binary exclusion.
|
||||
- **ADR-009 (interface-first DI)**: the composition root must be the single place that knows about concrete strategy classes; consumers (C2.5, runtime root) must hold typed references to the Protocol only. Without the Protocol, every consumer would import the concrete `UltraVprStrategy` directly.
|
||||
|
||||
The `descriptor_dim` validation also matters: without it, a config that points at the UltraVPR strategy (D=512) but the corpus index built for NetVLAD (D=4096) silently produces garbage retrievals — the kind of mismatch that should crash at startup, NOT at the first frame after takeoff.
|
||||
|
||||
## Outcome
|
||||
|
||||
- `src/gps_denied_onboard/components/c2_vpr/interface.py` defining:
|
||||
- `VprStrategy` Protocol with `embed_query`, `retrieve_topk`, `descriptor_dim` (PEP 544 structural with `@runtime_checkable`).
|
||||
- `BackbonePreprocessor` Protocol with `preprocess`, `input_shape`.
|
||||
- All seven invariants from the contract documented in the Protocol's docstring.
|
||||
- `src/gps_denied_onboard/components/c2_vpr/__init__.py` re-exporting the Protocol + DTOs (Public API per module-layout `c2_vpr` mapping).
|
||||
- `src/gps_denied_onboard/_types/vpr.py` defining the three frozen + slotted dataclasses: `VprQuery`, `VprCandidate`, `VprResult`. Added under shared `_types/` because `VprResult` is consumed cross-component (by C2.5 ReRanker).
|
||||
- `src/gps_denied_onboard/components/c2_vpr/errors.py` defining `VprError`, `VprBackboneError`, `VprPreprocessError`, `IndexUnavailableError`.
|
||||
- `src/gps_denied_onboard/runtime_root/vpr_factory.py` exporting `build_vpr_strategy(config, tile_store, inference_runtime) -> VprStrategy`. The function:
|
||||
1. Reads `config.vpr.strategy` (one of: `ultra_vpr`, `net_vlad`, `mega_loc`, `mix_vpr`, `sela_vpr`, `eigen_places`, `salad`).
|
||||
2. Lazy-imports the concrete module via `importlib.import_module(f"gps_denied_onboard.components.c2_vpr.{config.vpr.strategy}")`. ImportError → `ConfigurationError(f"BUILD_VPR_{strategy.upper()} is OFF for this binary; cannot select strategy={strategy}")`.
|
||||
3. Constructs the strategy via its module-level `create(config, tile_store, inference_runtime)` factory function (each concrete strategy module exports `create` as its public entry-point — keeps `__init__.py` re-exports minimal).
|
||||
4. Pre-flight validation: queries `strategy.descriptor_dim()`; queries `tile_store.descriptor_dim()` (a small read of the FAISS index sidecar); on mismatch raises `ConfigurationError(f"descriptor_dim mismatch: strategy={strategy_dim}, corpus={corpus_dim}")`.
|
||||
5. Returns the instance.
|
||||
- Composition-root `compose_root` extension: invoke `build_vpr_strategy` and bind the result to the C2 ingest thread (single-thread invariant per INV-1).
|
||||
- Config schema extension to AZ-269: `config.vpr.strategy` (enum), `config.vpr.backbone_weights_path` (path), `config.vpr.faiss_index_path` (path); validated at config load.
|
||||
- INFO log on every successful `build_vpr_strategy`: `kind="c2.vpr.strategy_loaded"` with strategy name + `descriptor_dim` value. ERROR log on `ConfigurationError` (with the specific dim mismatch or missing flag).
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- The two Protocols (`VprStrategy`, `BackbonePreprocessor`) + their docstrings encoding all seven invariants from the contract.
|
||||
- The three DTOs in `_types/vpr.py`.
|
||||
- The four-class error hierarchy in `c2_vpr/errors.py`.
|
||||
- The composition-root factory `build_vpr_strategy` with lazy-import + ImportError → `ConfigurationError` mapping + pre-flight `descriptor_dim` validation.
|
||||
- Config schema extension for `config.vpr.{strategy, backbone_weights_path, faiss_index_path}`.
|
||||
- Strategy resolution table comment in `vpr_factory.py` matching the contract's table verbatim.
|
||||
- Unit tests covering: Protocol conformance for a fake strategy, factory rejection on missing flag (lazy-import → ImportError → `ConfigurationError`), factory rejection on dim mismatch, pre-flight INFO log emission, DTO immutability + slot enforcement.
|
||||
- INFO / ERROR log emission per description.md § 9.
|
||||
|
||||
### Excluded
|
||||
|
||||
- Any concrete backbone implementation — owned by AZ-337 (UltraVPR), AZ-338 (NetVLAD), AZ-339 (MegaLoc + MixVPR), AZ-340 (SelaVPR + EigenPlaces + SALAD).
|
||||
- FAISS HNSW retrieve wiring — owned by AZ-341.
|
||||
- The `DescriptorNormaliser` helper — already AZ-283 (E-CC-HELPERS).
|
||||
- Component-internal tests beyond Protocol-conformance + factory-validation: C2-IT-01 / C2-IT-02 / C2-IT-03 / C2-IT-04 / C2-PT-01 / C2-ST-01 are deferred to Step 9 / E-BBT.
|
||||
- The C7 `InferenceRuntime` interface itself — owned by AZ-297; this task consumes the interface in the factory signature, does NOT define it.
|
||||
- The C6 `TileStore` interface itself — owned by AZ-303; this task consumes the interface (`tile_store.descriptor_dim()` for pre-flight match) and the `TileStore` Public API at `components/c6_tile_cache/__init__.py`.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Protocol conformance — fake strategy passes `runtime_checkable`**
|
||||
Given a `FakeVprStrategy` test double implementing `embed_query`, `retrieve_topk`, `descriptor_dim`
|
||||
When `isinstance(fake, VprStrategy)` is evaluated
|
||||
Then the result is `True`; the same evaluation against an object missing any one method returns `False`
|
||||
|
||||
**AC-2: DTO immutability + slots**
|
||||
Given a constructed `VprQuery`, `VprCandidate`, `VprResult`
|
||||
When attempting to mutate any field via attribute assignment
|
||||
Then `FrozenInstanceError` is raised; `__slots__` is non-empty (verified via `cls.__slots__`); the dataclasses use `frozen=True, slots=True`
|
||||
|
||||
**AC-3: Factory rejects missing build flag — ImportError → ConfigurationError**
|
||||
Given `config.vpr.strategy = "vins_mono"` (a non-existent C2 strategy that simulates a missing build flag) AND a `tile_store` test double AND a `inference_runtime` test double
|
||||
When `build_vpr_strategy(config, tile_store, inference_runtime)` is called
|
||||
Then `ConfigurationError` is raised with message containing `"BUILD_VPR_VINS_MONO is OFF"`; ONE ERROR log `kind="c2.vpr.build_flag_off"` is emitted
|
||||
|
||||
**AC-4: Factory rejects descriptor_dim mismatch**
|
||||
Given `config.vpr.strategy = "ultra_vpr"` (FakeUltraVpr returns `descriptor_dim() = 512`) AND `tile_store.descriptor_dim()` returns 4096
|
||||
When `build_vpr_strategy(...)` is called
|
||||
Then `ConfigurationError` is raised with message containing `"descriptor_dim mismatch: strategy=512, corpus=4096"`; ONE ERROR log `kind="c2.vpr.dim_mismatch"` is emitted; the strategy is NOT bound to the runtime root
|
||||
|
||||
**AC-5: Successful factory load emits INFO log**
|
||||
Given `config.vpr.strategy = "ultra_vpr"` AND matching `descriptor_dim` AND a valid lazy-importable `ultra_vpr` test double module
|
||||
When `build_vpr_strategy(...)` is called
|
||||
Then a `VprStrategy` instance is returned; ONE INFO log `kind="c2.vpr.strategy_loaded"` is emitted with structured fields `{strategy: "ultra_vpr", descriptor_dim: 512}`
|
||||
|
||||
**AC-6: Strategy resolution table — every entry resolves to its module path**
|
||||
Given each of the seven valid `config.vpr.strategy` values
|
||||
When `build_vpr_strategy` is called with each (assuming the module exists as a test double)
|
||||
Then each call returns a `VprStrategy` instance; the resolved module path matches the contract's strategy resolution table verbatim (`gps_denied_onboard.components.c2_vpr.<strategy>`)
|
||||
|
||||
**AC-7: Error hierarchy — every concrete error is catchable as `VprError`**
|
||||
Given test instances of `VprBackboneError`, `VprPreprocessError`, `IndexUnavailableError`
|
||||
When caught by `except VprError`
|
||||
Then all three are caught; `isinstance(err, VprError)` is `True` for each
|
||||
|
||||
**AC-8: Public API surface — `__init__.py` re-exports**
|
||||
Given `from gps_denied_onboard.components.c2_vpr import VprStrategy, VprQuery, VprResult`
|
||||
When the import is evaluated
|
||||
Then all three names resolve; `BackbonePreprocessor` is NOT in the Public API (C2-internal only — verified by `BackbonePreprocessor not in c2_vpr.__all__`)
|
||||
|
||||
**AC-9: Strategy bound to single ingest thread by composition root**
|
||||
Given a `compose_root(config)` invocation that wires C2
|
||||
When the resulting strategy is bound
|
||||
Then the strategy is bound to exactly one ingest thread (verifiable via the runtime root's thread-binding registry); a second binding attempt to the same strategy raises `RuntimeError`
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- `build_vpr_strategy` p99 ≤ 50 ms — the factory itself is a config read + lazy import + one `descriptor_dim()` call + one `tile_store.descriptor_dim()` call. Most of the construction cost lives inside the concrete strategy's `create(...)` function (TRT engine load — owned by AZ-337..AZ-340), NOT in this task.
|
||||
- Pre-flight validation overhead is bounded by the C6 sidecar read: ≤ 5 ms at p99.
|
||||
|
||||
**Compatibility**
|
||||
- The `VprStrategy` Protocol is a major API surface; any change to method signatures is a breaking change requiring a coordinated update of every implementation (lockstep — see Versioning in the contract).
|
||||
- DTO field additions follow the standard "frozen dataclass + new optional field with default" pattern.
|
||||
|
||||
**Reliability**
|
||||
- Lazy-import via `importlib.import_module` — a build-time-excluded backbone's import never executes (no native library load attempted, no CUDA initialisation, no TRT runtime instantiation).
|
||||
- Pre-flight `descriptor_dim` validation catches the silent-garbage failure mode (config + corpus mismatch) at startup.
|
||||
- Single-thread invariant enforced by composition root binding (AC-9); the strategy itself is not responsible for thread safety.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | `runtime_checkable` Protocol conformance | Fake strategy passes; partial fake fails |
|
||||
| AC-2 | DTO immutability + slots | `FrozenInstanceError` on mutation; `__slots__` non-empty |
|
||||
| AC-3 | Factory + nonexistent backbone module | `ConfigurationError("BUILD_VPR_<NAME> is OFF")`; ERROR log emitted |
|
||||
| AC-4 | Factory + dim mismatch | `ConfigurationError("descriptor_dim mismatch: strategy=X, corpus=Y")`; ERROR log emitted; no binding |
|
||||
| AC-5 | Factory + valid load | Strategy instance returned; INFO log emitted with structured fields |
|
||||
| AC-6 | Each of 7 strategy values | Each resolves to correct module path |
|
||||
| AC-7 | Error catchability | All three concrete errors caught by `except VprError` |
|
||||
| AC-8 | Public API re-exports | `VprStrategy`, `VprQuery`, `VprResult` resolve; `BackbonePreprocessor` not in Public API |
|
||||
| AC-9 | Single-thread binding | First binding succeeds; second on same instance raises `RuntimeError` |
|
||||
| NFR-perf-factory | Microbench `build_vpr_strategy` × 100 with mock concretes | p99 ≤ 50 ms |
|
||||
| NFR-perf-validate | Microbench pre-flight `descriptor_dim` check × 100 | p99 ≤ 5 ms |
|
||||
|
||||
## Constraints
|
||||
|
||||
- **No business logic beyond Protocol + factory + DTOs + errors.** The factory's pre-flight `descriptor_dim` check is the ONLY runtime computation this task performs.
|
||||
- **Lazy import is mandatory** — direct `from gps_denied_onboard.components.c2_vpr.ultra_vpr import UltraVprStrategy` in the factory is forbidden (would defeat ADR-002 build-time exclusion).
|
||||
- **`@runtime_checkable` MUST be used** — INV-1 isolates the binding-side enforcement of single-thread invariant; runtime_checkable lets composition root assert via `isinstance` without forcing every consumer to import the Protocol.
|
||||
- **DTOs MUST be `frozen=True, slots=True`** — immutability prevents accidental mutation across thread boundaries; slots reduces memory footprint at 3 Hz frame rate × N seconds.
|
||||
- **Strategy modules export `create(config, tile_store, inference_runtime)` as their entry-point** — keeps the factory's lazy-import surface uniform; per-strategy constructors stay private.
|
||||
- **`BackbonePreprocessor` is C2-internal** — must NOT be re-exported from `c2_vpr/__init__.py` (would violate description.md § 6 "C2-internal helper, NOT a shared helper").
|
||||
- **Config schema field `config.vpr.strategy` is an enum** validated at config load — typo'd values fail before the factory runs.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: `runtime_checkable` Protocol checks have known performance cost**
|
||||
- *Risk*: `isinstance(obj, RuntimeCheckableProtocol)` walks the method table; called per-frame at 3 Hz × 7 strategies it could add measurable overhead.
|
||||
- *Mitigation*: `isinstance` is called ONCE at composition-root binding time (AC-9), NOT per-frame. The per-frame path uses the bound concrete reference. Test asserts the binding-time check is the only `isinstance` call site.
|
||||
|
||||
**Risk 2: Lazy-import error message obscures the real failure mode**
|
||||
- *Risk*: A native library (e.g., FAISS or TensorRT) failing to load triggers `ImportError` from the lazy import, which the factory currently maps to "BUILD flag OFF" — but the actual cause may be a missing `.so` or version mismatch.
|
||||
- *Mitigation*: The factory catches `ImportError`, inspects `e.msg`; if the message contains "No module named" → "BUILD flag OFF" (the build-time-excluded case); otherwise re-raises the original ImportError preserving the native-library context. AC-3 covers the build-flag case; a separate test covers the native-library load case.
|
||||
|
||||
**Risk 3: `descriptor_dim` mismatch is detected too late if the corpus sidecar is corrupted**
|
||||
- *Risk*: A bit-flipped corpus sidecar reports the wrong `descriptor_dim`; the factory passes the validation but every retrieval returns garbage.
|
||||
- *Mitigation*: The C6 sidecar uses the AZ-280 `Sha256Sidecar` pattern; corruption is detected at sidecar load time (C6's responsibility, NOT this task's). The factory's contract is "match the sidecar's declared dim"; if the sidecar itself is wrong, that's a C6 bug.
|
||||
|
||||
**Risk 4: `compose_root` thread-binding registry is not yet implemented**
|
||||
- *Risk*: AC-9 references a "thread-binding registry" that AZ-270 (`compose_root`) may not yet provide.
|
||||
- *Mitigation*: This task's Public API is the factory; the runtime root is responsible for thread binding. If AZ-270 has not yet implemented the registry, this task delivers AC-1..AC-8 + a stub `bind_to_thread(strategy)` interface that AZ-270 fills in. AC-9 is gated on AZ-270's progress and may move to a follow-up task if the registry isn't ready. **Decision**: keep AC-9 in this task; if AZ-270 lacks the registry by implementation time, AZ-270 is the upstream blocker — escalate via the standard tracker dependency mechanism.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: cross-strategy `VprStrategy` Protocol + composition-root factory + pre-flight `descriptor_dim` validation + ADR-002 build-time exclusion enforcement (architecture / E-C2 / `solution.md` "Strategy + multiple backbones" / ADR-001 + ADR-002 + ADR-009).
|
||||
- **Production code that must exist**: real `VprStrategy` Protocol + real DTOs + real error hierarchy + real `build_vpr_strategy` factory with real lazy-import + real ImportError mapping + real `descriptor_dim` validation + real config schema extension.
|
||||
- **Allowed external stubs**: tests MAY use `FakeVprStrategy`, `FakeBackbonePreprocessor`, `FakeTileStore` returning a fixed `descriptor_dim`, `FakeInferenceRuntime`. Production wiring uses real concrete strategies (selected from AZ-337..AZ-340 at composition time) + the real C6 `TileStore` + the real C7 `InferenceRuntime`.
|
||||
- **Unacceptable substitutes**: direct `from gps_denied_onboard.components.c2_vpr.ultra_vpr import UltraVprStrategy` in the factory (would defeat ADR-002); a `Type[VprStrategy]` registry that pre-imports all 7 backbones (would defeat lazy-import); skipping the `descriptor_dim` pre-flight check (would let dim mismatch crash at first frame instead of startup); using `frozen=False` dataclasses (would let consumers mutate `VprResult` candidates list); making `BackbonePreprocessor` part of the Public API (would let other components import it, violating description.md § 6).
|
||||
Reference in New Issue
Block a user