[AZ-342] C2.5 ReRankStrategy: Protocol + DTOs + factory + composition

Foundational scaffolding for the InlierCountReRanker (AZ-343) and
the future C3 CrossDomainMatcher consumer (AZ-344). No concrete
re-ranker is implemented here.

* ReRankStrategy Protocol (single rerank(frame, vpr_result, n,
  calibration) -> RerankResult method) with all 8 invariants in the
  docstring — notably INV-8 drop-and-continue (per-candidate failure
  NEVER propagates unless every candidate fails).
* DTOs moved to L1 _types/rerank.py — RerankCandidate, RerankResult;
  frozen+slots; tuple-not-list for RerankResult.candidates; tile_id
  encoded as (zoom_level, lat, lon) tuple to keep _types/ free of any
  c6_tile_cache (L3) import per module-layout.md.
* Error family: RerankError + RerankBackboneError +
  RerankAllCandidatesFailedError. Only RerankAllCandidatesFailedError
  escapes rerank(); RerankBackboneError is caught inside the per-
  candidate loop, logged ERROR, FDR-stamped, candidate dropped.
* C2_5RerankConfig (strategy enum default "inlier_count", top_n int
  default 3) with strict validation at load; registered into
  Config.components on c2_5_rerank import.
* build_rerank_strategy(config, *, tile_store, lightglue_runtime)
  factory: 1-strategy resolution table, lazy import,
  BUILD_RERANK_<variant> gate, ImportError → StrategyNotAvailableError
  mapping. The shared LightGlueRuntime is constructor-injected
  (R14 fix: neither C2.5 nor C3 owns its lifecycle).

Renamed the Protocol from the existing stub "RerankStrategy" to
"ReRankStrategy" to match the contract; updated module-layout.md.
Removed the legacy RerankResult shape from _types/vpr.py — the
v1.0.0 shape lives in _types/rerank.py.

Excluded per task spec:
* Concrete InlierCountReRanker (AZ-343).
* C3 matcher protocol task (AZ-344, next in batch).
* AC-9 single-thread binding + AC-10 LightGlueRuntime identity-share
  between C2.5/C3 — deferred per task spec Risk 3 until the generic
  compose_root thread-binding registry and the C3 factory both land.

Tests: AC-1..AC-8 + AC-11 + NFR-perf-factory in
tests/unit/c2_5_rerank/test_protocol_conformance.py. The legacy
smoke test is removed. Full sweep: 997 passed (one pre-existing
flake in test_az296_takeoff_abort, subprocess timing, unrelated to
this commit; passes in isolation).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-12 05:31:27 +03:00
parent 3665acef66
commit d6756f1855
12 changed files with 871 additions and 54 deletions
@@ -1,6 +1,42 @@
"""C2.5 Rerank component — Public API."""
"""C2.5 ReRank — Public API (AZ-342).
from gps_denied_onboard._types.vpr import RerankResult
from gps_denied_onboard.components.c2_5_rerank.interface import RerankStrategy
Per ``rerank_strategy_protocol.md`` v1.0.0 the public surface
consists of:
__all__ = ["RerankResult", "RerankStrategy"]
- :class:`ReRankStrategy` Protocol (one method).
- DTOs re-exported from :mod:`gps_denied_onboard._types.rerank` (the
L1 home for cross-component DTOs): :class:`RerankCandidate`,
:class:`RerankResult`.
- Error family rooted at :class:`RerankError`; two documented
subtypes (:class:`RerankBackboneError`,
:class:`RerankAllCandidatesFailedError`).
- Config block :class:`C2_5RerankConfig` (registered on import).
Concrete strategy (``InlierCountReRanker``, AZ-343) lives in a
sibling module and is imported lazily by
:mod:`gps_denied_onboard.runtime_root.rerank_factory` — Risk-2
mitigation: this ``__init__.py`` MUST NOT import any concrete
strategy module.
"""
from gps_denied_onboard._types.rerank import RerankCandidate, RerankResult
from gps_denied_onboard.components.c2_5_rerank.config import C2_5RerankConfig
from gps_denied_onboard.components.c2_5_rerank.errors import (
RerankAllCandidatesFailedError,
RerankBackboneError,
RerankError,
)
from gps_denied_onboard.components.c2_5_rerank.interface import ReRankStrategy
from gps_denied_onboard.config.schema import register_component_block
register_component_block("c2_5_rerank", C2_5RerankConfig)
__all__ = [
"C2_5RerankConfig",
"ReRankStrategy",
"RerankAllCandidatesFailedError",
"RerankBackboneError",
"RerankCandidate",
"RerankError",
"RerankResult",
]
@@ -0,0 +1,54 @@
"""C2.5 ReRankStrategy config block (AZ-342).
Registered into ``config.components['c2_5_rerank']`` by the package
``__init__.py``. The composition-root factory
:func:`gps_denied_onboard.runtime_root.rerank_factory.build_rerank_strategy`
reads this block to select the strategy and configure the top-N cut.
``top_n`` is the strategy-side cap on the returned
:attr:`RerankResult.candidates` length; the composition root binds
``n`` per-frame from this value (default 3 per the epic's K=10 → N=3
spec).
"""
from __future__ import annotations
from dataclasses import dataclass
from typing import Final
from gps_denied_onboard.config.schema import ConfigError
__all__ = [
"C2_5RerankConfig",
"KNOWN_STRATEGIES",
]
KNOWN_STRATEGIES: Final[frozenset[str]] = frozenset({"inlier_count"})
@dataclass(frozen=True)
class C2_5RerankConfig:
"""Per-component config for C2.5 ReRank.
``strategy`` selects exactly one of the registered re-rankers
(today only ``inlier_count``); the composition-root factory
respects compile-time ``BUILD_RERANK_<variant>`` gating on top
of this label.
``top_n`` is the per-frame N cap (1..K-1). Default 3 (the epic's
K=10 → N=3 spec).
"""
strategy: str = "inlier_count"
top_n: int = 3
def __post_init__(self) -> None:
if self.strategy not in KNOWN_STRATEGIES:
raise ConfigError(
f"C2_5RerankConfig.strategy={self.strategy!r} not in "
f"{sorted(KNOWN_STRATEGIES)}"
)
if self.top_n < 1:
raise ConfigError(
f"C2_5RerankConfig.top_n must be >= 1; got {self.top_n}"
)
@@ -0,0 +1,56 @@
"""C2.5 ReRankStrategy error taxonomy (AZ-342).
The family is intentionally narrow: a per-candidate failure is the
normal case (drop-and-continue, INV-8) and is signalled via
``candidates_dropped`` in the returned :class:`RerankResult` —
NOT via an exception. An exception escapes ``rerank`` only when
EVERY candidate fails (:class:`RerankAllCandidatesFailedError`)
which is the C5 → VIO-only-fallback trigger per AC-3.5.
:class:`RerankBackboneError` is raised INSIDE the per-candidate loop,
caught by the strategy, logged ERROR, FDR-stamped, and the
candidate is dropped. It is exposed publicly so the per-candidate
log + FDR taxonomy is observable and so future re-rankers using a
different backbone can re-raise the same kind.
``TileFetchError`` is C6-owned
(``c6_tile_cache.errors.TileNotFoundError`` / ``TileFsError``); the
strategy catches it in the per-candidate loop and treats it
identically to :class:`RerankBackboneError`.
"""
from __future__ import annotations
__all__ = [
"RerankAllCandidatesFailedError",
"RerankBackboneError",
"RerankError",
]
class RerankError(Exception):
"""Base class for the C2.5 rerank error family.
Caught at the runtime root only when
:class:`RerankAllCandidatesFailedError` fires; per-candidate
failures stay inside the strategy.
"""
class RerankBackboneError(RerankError):
"""Per-candidate LightGlue forward-pass failure.
CUDA OOM, TRT engine deserialize mismatch. Logged at ERROR; one
FDR record per occurrence; the offending candidate is dropped
from the rerank set; the surrounding ``rerank`` call continues
with the remaining candidates (INV-8).
"""
class RerankAllCandidatesFailedError(RerankError):
"""Zero survivors after the per-candidate loop.
Every candidate's LightGlue or tile fetch failed. Logged at
ERROR; FDR record ``kind=rerank.all_failed``. C5 falls back to
VIO-only with provenance ``visual_propagated`` (AC-3.5).
"""
@@ -1,17 +1,98 @@
"""C2.5 `RerankStrategy` Protocol.
"""C2.5 ``ReRankStrategy`` Protocol (AZ-342).
Default: `InlierBasedReranker` (single-pair LightGlue inlier counter, K=10 → N=3).
See `_docs/02_document/components/03_c2_5_rerank/`.
PEP 544 ``typing.Protocol`` with ``runtime_checkable=True``; a single
``rerank`` method that consumes a C2 :class:`VprResult` and produces
a :class:`RerankResult` ranked by single-pair LightGlue inlier count.
Concrete impl — :class:`InlierCountReRanker` (AZ-343) — lives in a
sibling module and is imported lazily by
:mod:`gps_denied_onboard.runtime_root.rerank_factory`.
The contract at
``_docs/02_document/contracts/c2_5_rerank/rerank_strategy_protocol.md``
v1.0.0 is the authoritative shape; this module mirrors it 1:1.
"""
from __future__ import annotations
from typing import Protocol
from typing import TYPE_CHECKING, Protocol, runtime_checkable
from gps_denied_onboard._types.vpr import RerankResult, VprResult
if TYPE_CHECKING:
from gps_denied_onboard._types.calibration import CameraCalibration
from gps_denied_onboard._types.nav import NavCameraFrame
from gps_denied_onboard._types.rerank import RerankResult
from gps_denied_onboard._types.vpr import VprResult
__all__ = ["ReRankStrategy"]
class RerankStrategy(Protocol):
"""Re-rank C2's top-K candidates down to N via cross-domain match scoring."""
@runtime_checkable
class ReRankStrategy(Protocol):
"""Single-camera re-rank strategy.
def rerank(self, vpr_result: VprResult, n_keep: int = 3) -> RerankResult: ...
Stateless per-frame; the only persistent state is the
constructor-injected
:class:`gps_denied_onboard.helpers.lightglue_runtime.LightGlueRuntime`
helper handle and the :class:`TileStore` Public API reference.
Invariants (see ``rerank_strategy_protocol.md`` v1.0.0):
- **INV-1 single-threaded** — each instance is bound to one
ingest thread; the shared ``LightGlueRuntime`` requires serial
access. Concurrent :meth:`rerank` calls on a single instance
race the GPU stream.
- **INV-2 stateless per-frame** — same inputs → same surviving
candidates in same order.
- **INV-3 top-N descending by inlier_count** — ties broken
deterministically by ``descriptor_distance`` ascending (the
C2-stage value carried forward).
- **INV-4 candidates length bounded** — ``0 < len <= n`` when
returned (zero raises :class:`RerankAllCandidatesFailedError`);
never exceeds ``n``; never exceeds
``len(vpr_result.candidates)``.
- **INV-5 descriptor_distance carried forward unchanged** — the
C2-stage value is preserved on every survivor for FDR
provenance.
- **INV-6 tile_pixels_handle is a reference, NOT a copy** —
``RerankCandidate.tile_pixels_handle`` is the same handle
returned by ``TileStore.read_tile_pixels`` (page-cache
backed).
- **INV-7 deterministic per tuple** — same ``(frame,
vpr_result, corpus, helper)`` → bit-identical
:class:`RerankResult`.
- **INV-8 drop-and-continue** — a per-candidate exception
NEVER propagates out of :meth:`rerank` unless EVERY candidate
fails. C3 relies on this partial-input tolerance.
Error envelope: only :class:`RerankAllCandidatesFailedError`
escapes :meth:`rerank`; per-candidate
:class:`RerankBackboneError` / ``TileFetchError`` from C6 are
caught inside the loop and turned into dropped candidates +
ERROR logs + per-occurrence FDR records.
"""
def rerank(
self,
frame: "NavCameraFrame",
vpr_result: "VprResult",
n: int,
calibration: "CameraCalibration",
) -> "RerankResult":
"""Re-rank the top-K candidates down to top-N by inlier count.
For each ``candidate`` in ``vpr_result.candidates``:
1. Fetch tile pixels via ``TileStore.read_tile_pixels(candidate.tile_id)``.
2. Run a single-pair LightGlue forward via the shared
:class:`LightGlueRuntime` (frame ↔ tile).
3. Record the inlier count.
Sort candidates descending by inlier count; return the top-N
as a :class:`RerankResult`. Drop-and-continue semantics
apply per INV-8.
Raises:
RerankAllCandidatesFailedError: zero survivors after
the per-candidate loop.
"""
...