[AZ-342] C2.5 ReRankStrategy: Protocol + DTOs + factory + composition

Foundational scaffolding for the InlierCountReRanker (AZ-343) and
the future C3 CrossDomainMatcher consumer (AZ-344). No concrete
re-ranker is implemented here.

* ReRankStrategy Protocol (single rerank(frame, vpr_result, n,
  calibration) -> RerankResult method) with all 8 invariants in the
  docstring — notably INV-8 drop-and-continue (per-candidate failure
  NEVER propagates unless every candidate fails).
* DTOs moved to L1 _types/rerank.py — RerankCandidate, RerankResult;
  frozen+slots; tuple-not-list for RerankResult.candidates; tile_id
  encoded as (zoom_level, lat, lon) tuple to keep _types/ free of any
  c6_tile_cache (L3) import per module-layout.md.
* Error family: RerankError + RerankBackboneError +
  RerankAllCandidatesFailedError. Only RerankAllCandidatesFailedError
  escapes rerank(); RerankBackboneError is caught inside the per-
  candidate loop, logged ERROR, FDR-stamped, candidate dropped.
* C2_5RerankConfig (strategy enum default "inlier_count", top_n int
  default 3) with strict validation at load; registered into
  Config.components on c2_5_rerank import.
* build_rerank_strategy(config, *, tile_store, lightglue_runtime)
  factory: 1-strategy resolution table, lazy import,
  BUILD_RERANK_<variant> gate, ImportError → StrategyNotAvailableError
  mapping. The shared LightGlueRuntime is constructor-injected
  (R14 fix: neither C2.5 nor C3 owns its lifecycle).

Renamed the Protocol from the existing stub "RerankStrategy" to
"ReRankStrategy" to match the contract; updated module-layout.md.
Removed the legacy RerankResult shape from _types/vpr.py — the
v1.0.0 shape lives in _types/rerank.py.

Excluded per task spec:
* Concrete InlierCountReRanker (AZ-343).
* C3 matcher protocol task (AZ-344, next in batch).
* AC-9 single-thread binding + AC-10 LightGlueRuntime identity-share
  between C2.5/C3 — deferred per task spec Risk 3 until the generic
  compose_root thread-binding registry and the C3 factory both land.

Tests: AC-1..AC-8 + AC-11 + NFR-perf-factory in
tests/unit/c2_5_rerank/test_protocol_conformance.py. The legacy
smoke test is removed. Full sweep: 997 passed (one pre-existing
flake in test_az296_takeoff_abort, subprocess timing, unrelated to
this commit; passes in isolation).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-12 05:31:27 +03:00
parent 3665acef66
commit d6756f1855
12 changed files with 871 additions and 54 deletions
@@ -1,199 +0,0 @@
# C2.5 ReRank Strategy Protocol + Factory + Composition
**Task**: AZ-342_c2_5_rerank_strategy_protocol
**Name**: C2.5 `ReRankStrategy` Protocol + Factory + Composition
**Description**: Define the public `ReRankStrategy` Protocol (PEP 544 structural interface), the C2.5 DTOs (`RerankCandidate`, `RerankResult`), the error hierarchy (`RerankError` family with `RerankBackboneError`, `RerankAllCandidatesFailedError`), and the composition-root factory `build_rerank_strategy(config, tile_store, lightglue_runtime) -> ReRankStrategy` that selects the concrete re-ranker at startup based on `config.rerank.strategy` with lazy import + `BUILD_RERANK_<variant>` flag gating per ADR-002. The shared `LightGlueRuntime` helper (AZ-278 / E-CC-HELPERS) is constructor-injected — neither C2.5 nor C3 owns its lifecycle (R14 fix). This task delivers the foundational scaffolding `InlierCountReRanker` (AZ-343) depends on; no concrete re-ranker is implemented here.
**Complexity**: 2 points
**Dependencies**: AZ-263_initial_structure, AZ-269_config_loader, AZ-270_compose_root, AZ-278_lightglue_runtime (helper handle consumed via composition root), AZ-303_c6_storage_interfaces (for `TileStore` Public API), AZ-266_log_module
**Component**: c2_5_rerank (epic AZ-256 / E-C2.5)
**Tracker**: AZ-342
**Epic**: AZ-256 (E-C2.5)
### Document Dependencies
- `_docs/02_document/contracts/c2_5_rerank/rerank_strategy_protocol.md` — the public contract this task implements (Protocol surface + DTOs + error hierarchy + factory signature + invariants + test cases).
- `_docs/02_document/components/03_c2_5_rerank/description.md` — § 1 architectural pattern (Strategy); § 2 `ReRankStrategy` interface + DTOs; § 5 error handling; § 6 helper ownership (R14 resolution); § 9 logging.
- `_docs/02_document/module-layout.md` — § Per-Component Mapping `c2_5_rerank` (Public API + Internal + Owns + Imports from); § shared/helpers/lightglue_runtime row (R14 helper-ownership decision); § Layering — Layer 3.
- `_docs/02_document/architecture.md` — ADR-001 (Strategy + composition root), ADR-002 (build-time exclusion via CMake `BUILD_*` flags), ADR-009 (interface-first DI; composition root the only place that imports concrete strategies).
- `_docs/02_document/contracts/c2_vpr/vpr_strategy_protocol.md``VprResult` DTO (consumed by `rerank`).
- `_docs/02_document/contracts/shared_helpers/lightglue_runtime.md``LightGlueRuntime` helper handle consumed by the factory (constructor-injected, NOT instantiated here).
- `_docs/02_document/contracts/c6_tile_cache/tile_store.md``TileStore.get_tile_pixels` Public API consumed by the strategy (page-cache-backed reference, not a copy).
## Problem
Without this task, `InlierCountReRanker` (AZ-343) and the downstream consumer C3 CrossDomainMatcher (AZ-257) would each invent their own ad-hoc interface, breaking three architectural invariants:
- **ADR-001 (Strategy)**: re-rank algorithms must be swappable at composition time; without a shared Protocol, swapping (e.g., adding a learned re-ranker in a future cycle) requires rewriting every consumer.
- **ADR-002 (build-time exclusion)**: each re-ranker is gated by `BUILD_RERANK_<variant>`; without the lazy-import factory, any single missing module cascades into a hard import error at runtime, defeating per-binary exclusion.
- **ADR-009 (interface-first DI)**: the composition root must be the single place that knows about concrete re-ranker classes; consumers (C3, runtime root) hold typed references to the Protocol only. Without the Protocol, every consumer would import the concrete `InlierCountReRanker` directly.
The drop-and-continue contract (Invariant 8) also matters: without it codified in the Protocol's docstring, an implementer might let a per-candidate failure abort the whole `rerank` call, breaking C3's expectation of partial input tolerance and pushing more flights into the `RerankAllCandidatesFailedError` → VIO-only fallback path than necessary.
## Outcome
- `src/gps_denied_onboard/components/c2_5_rerank/interface.py` defining:
- `ReRankStrategy` Protocol with `rerank(frame, vpr_result, n, calibration) -> RerankResult` (PEP 544 structural with `@runtime_checkable`).
- All eight invariants from the contract documented in the Protocol's docstring.
- `src/gps_denied_onboard/components/c2_5_rerank/__init__.py` re-exporting the Protocol + DTOs (Public API per module-layout `c2_5_rerank` mapping: `ReRankStrategy`, `RerankResult`).
- `src/gps_denied_onboard/_types/rerank.py` defining the two frozen + slotted dataclasses: `RerankCandidate`, `RerankResult`. Added under shared `_types/` because `RerankResult` is consumed cross-component (by C3 CrossDomainMatcher).
- `src/gps_denied_onboard/components/c2_5_rerank/errors.py` defining `RerankError`, `RerankBackboneError`, `RerankAllCandidatesFailedError`.
- `src/gps_denied_onboard/runtime_root/rerank_factory.py` exporting `build_rerank_strategy(config, tile_store, lightglue_runtime) -> ReRankStrategy`. The function:
1. Reads `config.rerank.strategy` (currently only `"inlier_count"` is defined).
2. Lazy-imports the concrete module via `importlib.import_module(f"gps_denied_onboard.components.c2_5_rerank.{module_name}")` per the strategy resolution table in the contract.
3. ImportError where `e.msg` contains "No module named" → `ConfigurationError(f"BUILD_RERANK_{strategy.upper()} is OFF for this binary; cannot select strategy={strategy}")`. Other ImportErrors (native library load failures) re-raised unchanged.
4. Constructs the strategy via its module-level `create(config, tile_store, lightglue_runtime)` factory function (each concrete re-ranker module exports `create` as its public entry-point — keeps `__init__.py` re-exports minimal).
5. Returns the instance. The runtime root binds it to one ingest thread.
- Composition-root `compose_root` extension: invoke `build_rerank_strategy` after `LightGlueRuntime` is constructed; bind the result to the same C2.5 ingest thread that was bound to C2 (single-thread invariant per INV-1; same thread as C3 since both share `LightGlueRuntime`).
- Config schema extension to AZ-269: `config.rerank.strategy` (enum, default `"inlier_count"`), `config.rerank.top_n` (int, default 3), validated at config load.
- INFO log on every successful `build_rerank_strategy`: `kind="c2_5.rerank.strategy_loaded"` with strategy name + `top_n`. ERROR log on `ConfigurationError` (with the missing flag detail).
## Scope
### Included
- The `ReRankStrategy` Protocol + its docstring encoding all eight invariants from the contract.
- The two DTOs in `_types/rerank.py` (`RerankCandidate`, `RerankResult`).
- The three-class error hierarchy in `c2_5_rerank/errors.py`.
- The composition-root factory `build_rerank_strategy` with lazy-import + ImportError → `ConfigurationError` mapping.
- Config schema extension for `config.rerank.{strategy, top_n}`.
- Strategy resolution table comment in `rerank_factory.py` matching the contract's table verbatim.
- Composition-root wiring path that constructs `LightGlueRuntime` ONCE and passes the same reference to both `build_rerank_strategy` and `build_matcher_strategy` (the C3 factory; cross-task coordination point with AZ-257's protocol task).
- Unit tests covering: Protocol conformance for a fake strategy, factory rejection on missing flag (lazy-import → ImportError → `ConfigurationError`), factory acceptance for the valid `"inlier_count"` value, INFO log emission, DTO immutability + slot enforcement, error hierarchy catchability.
- INFO / ERROR log emission per description.md § 9.
### Excluded
- Any concrete re-ranker implementation — owned by AZ-343 (`InlierCountReRanker`).
- The `LightGlueRuntime` helper itself — already AZ-278 (E-CC-HELPERS); this task consumes the constructor-injected handle.
- The C6 `TileStore` interface itself — owned by AZ-303; this task references the Public API in the factory signature.
- Component-internal tests beyond Protocol-conformance + factory-validation: C2.5-IT-01 (top-1 promotion rate), C2.5-IT-02 (drop-and-continue smoke), C2.5-IT-03 (helper serial-access), C2.5-PT-01 (latency NFR) are deferred to Step 9 / E-BBT.
- C3 matcher's protocol task and factory — owned by AZ-257's component decomposition.
## Acceptance Criteria
**AC-1: Protocol conformance — fake strategy passes `runtime_checkable`**
Given a `FakeReRankStrategy` test double implementing `rerank`
When `isinstance(fake, ReRankStrategy)` is evaluated
Then the result is `True`; the same evaluation against an object missing `rerank` returns `False`
**AC-2: DTO immutability + slots**
Given a constructed `RerankCandidate`, `RerankResult`
When attempting to mutate any field via attribute assignment
Then `FrozenInstanceError` is raised; `__slots__` is non-empty (verified via `cls.__slots__`); the dataclasses use `frozen=True, slots=True`
**AC-3: Factory rejects missing build flag — ImportError → ConfigurationError**
Given `config.rerank.strategy = "nonexistent_reranker"` (a non-existent module that simulates a missing build flag) AND a `tile_store` test double AND a `lightglue_runtime` test double
When `build_rerank_strategy(config, tile_store, lightglue_runtime)` is called
Then `ConfigurationError` is raised with message containing `"BUILD_RERANK_NONEXISTENT_RERANKER is OFF"`; ONE ERROR log `kind="c2_5.rerank.build_flag_off"` is emitted
**AC-4: Factory rejects unknown strategy at config-load time**
Given `config.rerank.strategy = "garbage"` AND the strategy resolution table does NOT contain "garbage"
When `load_config(...)` is called
Then `ConfigurationError` is raised at config-load time (the enum validation), NOT at factory time; the factory is never invoked
**AC-5: Successful factory load emits INFO log**
Given `config.rerank.strategy = "inlier_count"` AND `config.rerank.top_n = 3` AND a valid lazy-importable `inlier_based_reranker` test double module
When `build_rerank_strategy(...)` is called
Then a `ReRankStrategy` instance is returned; ONE INFO log `kind="c2_5.rerank.strategy_loaded"` is emitted with structured fields `{strategy: "inlier_count", top_n: 3}`
**AC-6: Strategy resolution table — every entry resolves to its module path**
Given each valid `config.rerank.strategy` value (currently only `"inlier_count"`)
When `build_rerank_strategy` is called (assuming the module exists as a test double)
Then the call returns a `ReRankStrategy` instance; the resolved module path matches the contract's strategy resolution table verbatim (`gps_denied_onboard.components.c2_5_rerank.inlier_based_reranker`)
**AC-7: Error hierarchy — every concrete error is catchable as `RerankError`**
Given test instances of `RerankBackboneError`, `RerankAllCandidatesFailedError`
When caught by `except RerankError`
Then both are caught; `isinstance(err, RerankError)` is `True` for each
**AC-8: Public API surface — `__init__.py` re-exports**
Given `from gps_denied_onboard.components.c2_5_rerank import ReRankStrategy, RerankResult`
When the import is evaluated
Then both names resolve; internal names (e.g., `_validate_inputs`, factory-private helpers) are NOT in the Public API (`__all__` exposes only `ReRankStrategy`, `RerankResult`)
**AC-9: Strategy bound to single ingest thread by composition root**
Given a `compose_root(config)` invocation that wires C2.5
When the resulting strategy is bound
Then the strategy is bound to exactly one ingest thread (verifiable via the runtime root's thread-binding registry); a second binding attempt to the same strategy raises `RuntimeError`
**AC-10: Composition root passes the SAME `LightGlueRuntime` instance to both C2.5 and C3**
Given a `compose_root(config)` invocation that wires both C2.5 and C3
When the resulting strategies are inspected
Then `c2_5_strategy._lightglue_runtime is c3_strategy._lightglue_runtime` (identity, not equality); ONE INFO log `kind="runtime_root.lightglue_runtime.shared"` is emitted at composition time confirming the shared binding
**AC-11: `RerankCandidate.tile_pixels_handle` is opaque**
Given a constructed `RerankCandidate(tile_pixels_handle=some_obj)`
When the field is accessed
Then it returns the same `some_obj` (identity); the Protocol does NOT type-restrict the handle (it's `object` by design — C6 owns the actual type)
## Non-Functional Requirements
**Performance**
- `build_rerank_strategy` p99 ≤ 50 ms — the factory itself is a config read + lazy import + one constructor call. The constructor cost lives inside the concrete re-ranker (TRT engine warm-up — owned by AZ-343), NOT in this task.
**Compatibility**
- The `ReRankStrategy` Protocol is a major API surface; any change to method signature is a breaking change requiring a coordinated update of every implementation (lockstep — see Versioning in the contract).
- DTO field additions follow the standard "frozen dataclass + new optional field with default" pattern.
- The drop-and-continue contract (Invariant 8) is non-negotiable; documented in the Protocol's docstring as a contract clause that implementations MUST satisfy.
**Reliability**
- Lazy-import via `importlib.import_module` — a build-time-excluded re-ranker's import never executes (no native library load attempted, no CUDA initialisation).
- Single-thread invariant enforced by composition root binding (AC-9); the strategy itself is not responsible for thread safety.
- Identity-shared `LightGlueRuntime` (AC-10) ensures C2.5 and C3 cannot accidentally use different helper instances (which would either double GPU memory or break the serial-access invariant).
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|--------------|------------------|
| AC-1 | `runtime_checkable` Protocol conformance | Fake strategy passes; partial fake fails |
| AC-2 | DTO immutability + slots | `FrozenInstanceError` on mutation; `__slots__` non-empty |
| AC-3 | Factory + nonexistent re-ranker module | `ConfigurationError("BUILD_RERANK_<NAME> is OFF")`; ERROR log emitted |
| AC-4 | Config load + invalid enum | `ConfigurationError` at config-load time; factory never invoked |
| AC-5 | Factory + valid load | Strategy instance returned; INFO log emitted with structured fields |
| AC-6 | Strategy resolution to `inlier_based_reranker` module | Resolves to correct module path |
| AC-7 | Error catchability | Both concrete errors caught by `except RerankError` |
| AC-8 | Public API re-exports | `ReRankStrategy`, `RerankResult` resolve; internals not in `__all__` |
| AC-9 | Single-thread binding | First binding succeeds; second on same instance raises `RuntimeError` |
| AC-10 | `LightGlueRuntime` identity-shared between C2.5 and C3 | `c2_5._lightglue_runtime is c3._lightglue_runtime`; INFO log emitted |
| AC-11 | `tile_pixels_handle` opaqueness | Identity preserved; Protocol does not constrain type |
| NFR-perf-factory | Microbench `build_rerank_strategy` × 100 with mock concrete | p99 ≤ 50 ms |
## Constraints
- **No business logic beyond Protocol + factory + DTOs + errors.** The factory does NOT call `lightglue_runtime` or `tile_store` methods at construction time; those calls happen during `rerank` (per-frame), owned by AZ-343.
- **Lazy import is mandatory** — direct `from gps_denied_onboard.components.c2_5_rerank.inlier_based_reranker import InlierCountReRanker` in the factory is forbidden (would defeat ADR-002 build-time exclusion).
- **`@runtime_checkable` MUST be used** — INV-1 isolates the binding-side enforcement of single-thread invariant; runtime_checkable lets composition root assert via `isinstance` without forcing every consumer to import the Protocol.
- **DTOs MUST be `frozen=True, slots=True`** — immutability prevents accidental mutation across thread boundaries; slots reduces memory footprint.
- **Concrete re-ranker modules export `create(config, tile_store, lightglue_runtime)` as their entry-point** — keeps the factory's lazy-import surface uniform; per-strategy constructors stay private.
- **Config schema field `config.rerank.strategy` is an enum** validated at config load — typo'd values fail before the factory runs.
- **The factory does NOT instantiate `LightGlueRuntime`** — that is the runtime root's responsibility, BEFORE this factory runs. AC-10 enforces the identity-share with C3.
## Risks & Mitigation
**Risk 1: `runtime_checkable` Protocol checks have known performance cost**
- *Risk*: `isinstance(obj, RuntimeCheckableProtocol)` walks the method table; called per-frame at 3 Hz it could add measurable overhead.
- *Mitigation*: `isinstance` is called ONCE at composition-root binding time (AC-9), NOT per-frame. The per-frame path uses the bound concrete reference. Test asserts the binding-time check is the only `isinstance` call site against `ReRankStrategy`.
**Risk 2: Lazy-import error message obscures the real failure mode**
- *Risk*: A native library (e.g., LightGlue TRT engine) failing to load triggers `ImportError` from the lazy import, which the factory currently maps to "BUILD flag OFF" — but the actual cause may be a missing `.so` or version mismatch.
- *Mitigation*: The factory catches `ImportError`, inspects `e.msg`; if the message contains "No module named" → "BUILD flag OFF" (the build-time-excluded case); otherwise re-raises the original ImportError preserving the native-library context. AC-3 covers the build-flag case; a separate test covers the native-library load case.
**Risk 3: `compose_root` thread-binding registry / `LightGlueRuntime` identity-share contract is not yet implemented**
- *Risk*: AC-9 + AC-10 reference a "thread-binding registry" and a shared-helper composition that AZ-270 (`compose_root`) and AZ-278 (helper) may not yet provide.
- *Mitigation*: This task's Public API is the factory; the runtime root is responsible for thread binding and helper sharing. If AZ-270 has not yet implemented the registry, this task delivers AC-1..AC-8 + AC-11 + a stub `bind_to_thread(strategy)` interface that AZ-270 fills in. AC-9 / AC-10 are gated on AZ-270's progress and may move to a follow-up task if the registry isn't ready. **Decision**: keep AC-9 / AC-10 in this task; if AZ-270 lacks the registry by implementation time, AZ-270 is the upstream blocker — escalate via the standard tracker dependency mechanism.
**Risk 4: A future learned re-ranker may need a different constructor signature**
- *Risk*: A future `LearnedReRanker` may need additional dependencies (e.g., a separate `ReRankInferenceRuntime`) that don't fit `create(config, tile_store, lightglue_runtime)`.
- *Mitigation*: The `create` factory pattern is per-module — each module owns its own `create` function. The composition-root factory `build_rerank_strategy` selects the module and invokes its `create`; if a future module needs different deps, the composition root passes them through. Today's signature is `create(config, tile_store, lightglue_runtime)` because every C2.5 strategy will plausibly need those three; if that ever changes, the factory's signature evolves.
## Runtime Completeness
- **Named capability**: `ReRankStrategy` Protocol + composition-root factory + ADR-002 build-time exclusion enforcement (architecture / E-C2.5 / `solution.md` "K=10 → N=3 by single-pair LightGlue inlier count" / ADR-001 + ADR-002 + ADR-009).
- **Production code that must exist**: real `ReRankStrategy` Protocol + real DTOs + real error hierarchy + real `build_rerank_strategy` factory with real lazy-import + real ImportError mapping + real config schema extension + real composition-root wiring path that identity-shares `LightGlueRuntime` with C3.
- **Allowed external stubs**: tests MAY use `FakeReRankStrategy`, `FakeTileStore`, `FakeLightGlueRuntime`. Production wiring uses the real `InlierCountReRanker` (selected from AZ-343 at composition time) + the real C6 `TileStore` + the real shared `LightGlueRuntime` helper.
- **Unacceptable substitutes**: direct `from gps_denied_onboard.components.c2_5_rerank.inlier_based_reranker import InlierCountReRanker` in the factory (would defeat ADR-002); a `Type[ReRankStrategy]` registry that pre-imports all re-rankers (would defeat lazy-import); skipping the identity-share enforcement (AC-10) and constructing a SECOND `LightGlueRuntime` for C2.5 (would double GPU memory and break the serial-access invariant the helper relies on).
## Contract
This task produces/implements the contract at `_docs/02_document/contracts/c2_5_rerank/rerank_strategy_protocol.md`.
Consumers MUST read that file — not this task spec — to discover the interface.