mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 14:21:14 +00:00
Decompose Step 6 snapshot: 140 task specs + contract docs
Closes out greenfield Step 6 (Decompose) for all 14 components (C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446 plus the _dependencies_table.md and component contract documents. State file updated to greenfield Step 7 (Implement), not_started. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,183 @@
|
||||
# Contract: `ReRankStrategy` Protocol
|
||||
|
||||
**Owner**: c2_5_rerank (epic AZ-256 / E-C2.5)
|
||||
**Producer task**: AZ-342 (`ReRankStrategy` Protocol + factory + composition)
|
||||
**Consumer tasks**: AZ-343 (`InlierCountReRanker` impl); downstream c3_matcher (epic AZ-257 / E-C3 — TBD at AZ-257 decompose time) which consumes `RerankResult`
|
||||
**Version**: 1.0.0
|
||||
**Status**: draft, awaiting AZ-342 implementation
|
||||
**Last Updated**: 2026-05-10
|
||||
**Module-layout home**: `src/gps_denied_onboard/components/c2_5_rerank/interface.py` (Protocol), `src/gps_denied_onboard/components/c2_5_rerank/__init__.py` (re-exports), `src/gps_denied_onboard/runtime_root/rerank_factory.py` (factory)
|
||||
|
||||
## Change Log
|
||||
|
||||
| Version | Date | Change | Author |
|
||||
|---------|------|--------|--------|
|
||||
| 1.0.0 | 2026-05-10 | Initial contract — Protocol surface, DTOs, error hierarchy, factory signature, 8 invariants, drop-and-continue contract (INV-8) | autodev / decompose Step 2 |
|
||||
|
||||
## Purpose
|
||||
|
||||
Defines the public interface for the C2.5 inlier-based re-rank strategy: `rerank` consumes a C2 `VprResult` (top-K=10) and produces a `RerankResult` (top-N=3) ranked by single-pair LightGlue inlier count against each candidate's tile pixels. The re-rank step is the architectural boundary between cheap descriptor retrieval (C2) and expensive cross-domain matching (C3) — it pays a small extra GPU cost so C3 only operates on the most promising candidates.
|
||||
|
||||
`ReRankStrategy` is a Strategy interface with a single concrete implementation today (`InlierCountReRanker`). Future re-rank algorithms (e.g., learned re-rankers) can be added as additional implementations behind the same interface, gated by `BUILD_RERANK_<variant>` build flags per ADR-002.
|
||||
|
||||
The shared `LightGlueRuntime` helper (AZ-278 / `helpers.lightglue_runtime`) is constructor-injected — neither C2.5 nor C3 owns the helper. This resolves R14 (apparent C2.5↔C3 cycle) by making both components sibling consumers of the helper.
|
||||
|
||||
## Public API
|
||||
|
||||
### Protocol: `ReRankStrategy`
|
||||
|
||||
```python
|
||||
from typing import Protocol, runtime_checkable
|
||||
from gps_denied_onboard._types import NavCameraFrame, CameraCalibration, VprResult, RerankResult
|
||||
|
||||
|
||||
@runtime_checkable
|
||||
class ReRankStrategy(Protocol):
|
||||
"""Single-camera re-rank strategy. Stateless per-frame; the only persistent state is the constructor-injected `LightGlueRuntime` helper handle and the `TileStore` Public API reference."""
|
||||
|
||||
def rerank(
|
||||
self,
|
||||
frame: NavCameraFrame,
|
||||
vpr_result: VprResult,
|
||||
n: int,
|
||||
calibration: CameraCalibration,
|
||||
) -> RerankResult:
|
||||
"""Re-rank the top-K candidates from `vpr_result` down to top-N by single-pair LightGlue inlier count.
|
||||
|
||||
For each candidate in `vpr_result.candidates`:
|
||||
1. Fetch tile pixels via `TileStore.get_tile_pixels(candidate.tile_id)`.
|
||||
2. Run a single-pair LightGlue forward via the shared `LightGlueRuntime` (frame ↔ tile).
|
||||
3. Record the inlier count.
|
||||
Sort candidates descending by inlier count; return the top-N as a `RerankResult`.
|
||||
|
||||
Drop-and-continue semantics: if a per-candidate failure occurs (`TileFetchError` from C6 OR `RerankBackboneError` from LightGlue), the candidate is dropped from the rerank set and a per-candidate ERROR log + FDR record is emitted. Sorting and top-N selection proceed against the surviving candidates.
|
||||
|
||||
If FEWER than N candidates survive, the strategy returns `RerankResult` with whatever it has (length 1..N-1); C3 proceeds with reduced N. If ZERO candidates survive, the strategy raises `RerankAllCandidatesFailedError`; downstream C5 falls back to VIO-only with provenance `visual_propagated` (AC-3.5).
|
||||
|
||||
Raises:
|
||||
RerankAllCandidatesFailedError: every candidate's LightGlue or tile-fetch failed; no rerank result possible.
|
||||
"""
|
||||
...
|
||||
```
|
||||
|
||||
**Invariants** (every implementation MUST guarantee):
|
||||
|
||||
1. **Single-threaded by contract** — each instance is bound to one ingest thread (composition root enforces). The shared `LightGlueRuntime` requires serial access (per description.md § 7); concurrent `rerank` calls on a single instance race the GPU stream.
|
||||
2. **Stateless per-frame** — no implicit dependency on prior frames; reordering `rerank` calls (which the live path NEVER does, but tests do) MUST yield identical `RerankResult` content (same surviving candidates in same order, given same inputs).
|
||||
3. **Top-N ordering by inlier count descending** — `RerankResult.candidates` is sorted descending by `inlier_count`. Ties broken deterministically by `descriptor_distance` ascending (carried forward from C2). Stable, reproducible across runs.
|
||||
4. **`RerankResult.candidates` length is bounded** — `0 < len <= n` when returned (zero raises `RerankAllCandidatesFailedError`); never exceeds `n`; never exceeds `len(vpr_result.candidates)`.
|
||||
5. **`descriptor_distance` is carried forward unchanged** — re-rank does NOT compute a new descriptor distance; the C2-stage value is preserved on every surviving `RerankCandidate` for FDR provenance.
|
||||
6. **`tile_pixels_handle` is a reference, NOT a copy** — `RerankCandidate.tile_pixels_handle` is the same handle returned by `TileStore.get_tile_pixels` (page-cache backed). Copying tile pixels at re-rank time would defeat AC-4.1's latency budget.
|
||||
7. **Deterministic per (frame, vpr_result, corpus, helper) tuple** — given identical inputs and an identical `LightGlueRuntime` helper state, two calls return bit-identical `RerankResult` (same inlier counts, same ordering, same surviving candidates).
|
||||
8. **Drop-and-continue is the ONLY per-candidate failure mode** — a per-candidate exception NEVER propagates out of `rerank` unless every candidate fails. This is the contract that lets C3 absorb partial failures gracefully.
|
||||
|
||||
### DTOs (in `_types/rerank.py`)
|
||||
|
||||
```python
|
||||
from dataclasses import dataclass
|
||||
from uuid import UUID
|
||||
import numpy as np
|
||||
|
||||
|
||||
@dataclass(frozen=True, slots=True)
|
||||
class RerankCandidate:
|
||||
"""One re-rank survivor. Carries the C2-stage descriptor_distance forward for FDR provenance plus the new inlier_count from single-pair LightGlue."""
|
||||
|
||||
tile_id: tuple # composite (zoomLevel, lat, lon); see C6 TileRecord
|
||||
inlier_count: int # single-pair LightGlue inliers; > 0 for any survivor
|
||||
descriptor_distance: float # carried forward from C2's VprCandidate
|
||||
descriptor_dim: int # carried forward from C2 for sanity assertions
|
||||
tile_pixels_handle: object # opaque page-cache-backed pixel reference; see C6 TileStore contract
|
||||
|
||||
|
||||
@dataclass(frozen=True, slots=True)
|
||||
class RerankResult:
|
||||
"""Top-N survivors from `ReRankStrategy.rerank`. Consumed by C3 CrossDomainMatcher."""
|
||||
|
||||
frame_id: UUID
|
||||
candidates: list[RerankCandidate] # 0 < len <= n; sorted descending by inlier_count, ties broken by descriptor_distance ascending
|
||||
reranked_at: int # monotonic_ns
|
||||
rerank_label: str # non-empty; matches BUILD_RERANK_<variant> lowercase (e.g., "inlier_count")
|
||||
candidates_input: int # len(vpr_result.candidates) at entry — for FDR observability
|
||||
candidates_dropped: int # candidates_input - len(candidates)
|
||||
```
|
||||
|
||||
### Error Hierarchy (in `c2_5_rerank/errors.py`)
|
||||
|
||||
```python
|
||||
class RerankError(Exception):
|
||||
"""Base for all C2.5 re-rank errors. Caught at the runtime root; downstream effect: C5 falls back to VIO-only with provenance `visual_propagated` (AC-3.5) only when `RerankAllCandidatesFailedError` is raised."""
|
||||
|
||||
|
||||
class RerankBackboneError(RerankError):
|
||||
"""Per-candidate LightGlue forward-pass failure (CUDA OOM, TRT engine deserialize mismatch). Logged at ERROR; per-occurrence FDR record. Drop-and-continue: the candidate is dropped from the rerank set, NOT the whole batch."""
|
||||
|
||||
|
||||
class RerankAllCandidatesFailedError(RerankError):
|
||||
"""Every candidate's LightGlue or tile fetch failed; zero survivors. Logged at ERROR; per-occurrence FDR record `kind=rerank.all_failed`. C5 falls back to VIO-only."""
|
||||
```
|
||||
|
||||
`TileFetchError` is owned by C6 (`components.c6_tile_cache`); C2.5 catches it inside the per-candidate loop and treats it identically to `RerankBackboneError` (drop-and-continue + ERROR log + FDR record `kind=rerank.tile_fetch_error`).
|
||||
|
||||
## Composition-Root Factory
|
||||
|
||||
```python
|
||||
# src/gps_denied_onboard/runtime_root/rerank_factory.py
|
||||
|
||||
from gps_denied_onboard.config import Config
|
||||
from gps_denied_onboard.components.c2_5_rerank import ReRankStrategy
|
||||
from gps_denied_onboard.components.c6_tile_cache import TileStore
|
||||
from gps_denied_onboard.helpers.lightglue_runtime import LightGlueRuntime
|
||||
|
||||
|
||||
def build_rerank_strategy(
|
||||
config: Config,
|
||||
tile_store: TileStore,
|
||||
lightglue_runtime: LightGlueRuntime,
|
||||
) -> ReRankStrategy:
|
||||
"""Composition-root factory. Reads `config.rerank.strategy` (currently only `"inlier_count"` is defined; future strategies extend the table); lazy-imports the concrete strategy module gated by its CMake `BUILD_RERANK_<variant>` flag; refuses to instantiate a strategy whose flag is OFF (raises `ConfigurationError` pointing at the offending strategy name + missing flag).
|
||||
|
||||
Strategy resolution table:
|
||||
|
||||
| config.rerank.strategy | Implementation | Module | Build flag |
|
||||
|------------------------|-----------------------|---------------------------------------------------|---------------------------|
|
||||
| "inlier_count" | InlierCountReRanker | components.c2_5_rerank.inlier_based_reranker | BUILD_RERANK_INLIER_COUNT |
|
||||
|
||||
The shared `LightGlueRuntime` is constructor-injected; the factory does NOT own its lifecycle. The runtime root constructs ONE `LightGlueRuntime` instance and passes the same reference to both this factory (for C2.5) and the C3 matcher factory.
|
||||
|
||||
Returns a fully-constructed strategy ready for `rerank` invocation. The caller (runtime root) is responsible for binding the instance to one ingest thread.
|
||||
"""
|
||||
...
|
||||
```
|
||||
|
||||
## Versioning
|
||||
|
||||
- The `ReRankStrategy` Protocol's method signature is part of the cross-component public API. Any change (new method, removed method, parameter rename, return-type change) is a major bump and requires updating every concrete implementation in lockstep.
|
||||
- DTO field additions are minor (frozen dataclasses with new optional fields default to None); field removals are major.
|
||||
- The drop-and-continue contract (Invariant 8) is non-negotiable; changing it would break C3's tolerance of partial input.
|
||||
|
||||
## Test Cases (protocol conformance — runs against every concrete strategy)
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|--------------|------------------|
|
||||
| INV-1 (single-thread) | Composition root rejects multi-thread binding | `RuntimeError` on second binding attempt |
|
||||
| INV-2 (stateless) | `rerank(frame_A)` then `rerank(frame_B)` then `rerank(frame_A)` again with the same `vpr_result` | First and third call return identical `RerankResult` (same surviving candidates, same order) |
|
||||
| INV-3 (top-N order) | Mixed inlier counts (e.g., [412, 198, 287, 0, 153, ...]) on K=10 input with N=3 | Returned candidates sorted descending by inlier_count: [412, 287, 198] |
|
||||
| INV-3 (tie-break) | Two candidates with identical inlier_count but different descriptor_distance | Lower descriptor_distance ranked first |
|
||||
| INV-4 (length bound) | N=3 with K=10 input, all 10 succeeding | `len(result.candidates) == 3` |
|
||||
| INV-4 (length under failure) | N=3 with K=10 input, 8 candidates fail | `len(result.candidates) == 2`; `candidates_dropped == 8` |
|
||||
| INV-5 (descriptor_distance carried) | Each survivor's `descriptor_distance` | Equals the C2-stage value from `vpr_result.candidates[i].descriptor_distance` |
|
||||
| INV-6 (handle is reference) | Mutate the underlying tile pixel buffer and re-read via `tile_pixels_handle` | Mutation visible (proves no copy) |
|
||||
| INV-7 (deterministic) | `rerank(same inputs)` × 3 | All three return bit-identical `RerankResult` (same inlier_counts, same ordering, same surviving tile_ids) |
|
||||
| INV-8 (drop-and-continue) | One candidate raises `RerankBackboneError`; nine succeed | Result has 3 survivors from the surviving 9; ONE ERROR log per failed candidate; the success path is NOT interrupted |
|
||||
| AC-2.5-IT-01 (top-1 promotion rate) | `rerank` against fixture corpus where C2 top-1 was correct | Top-1 promotion rate ≥ 0.98 (C2's top-1 is preserved as result top-1 in ≥ 98% of frames) |
|
||||
| AC-2.5-IT-02 (drop-and-continue smoke) | Inject `RerankBackboneError` for one candidate | Drop semantics hold; surviving candidates re-ranked |
|
||||
| AC-2.5-IT-03 (helper serial-access) | Two `rerank` calls on the same instance from a single thread | Second call sees no `LightGlueRuntime` state corruption from the first; results bit-identical to single-threaded baseline |
|
||||
| All-fail | Inject `RerankBackboneError` for every candidate | `RerankAllCandidatesFailedError` raised; per-candidate ERROR logs + final `kind=rerank.all_failed` FDR record |
|
||||
|
||||
## Open Questions / Risks
|
||||
|
||||
- **Risk: the shared `LightGlueRuntime` helper's serial-access invariant must be enforced upstream** — by the composition root binding both C2.5 and C3 to the same single ingest thread. *Mitigation*: AZ-278 (helper) ships with an internal assertion on each call that the calling thread matches the binding thread; AZ-342 (this Protocol task) consumes the helper as a constructor dependency and does NOT need to add a per-call check.
|
||||
- **Risk: `tile_pixels_handle` semantics drift between C6's `TileStore` Public API and C2.5's expectation** — C2.5 expects a page-cache-backed reference, NOT a copy; C6's `get_tile_pixels` MUST guarantee that. *Mitigation*: cross-referenced in AZ-303 (`tile_store` contract) — the contract test for `get_tile_pixels` asserts the returned object is the same identity across two calls within a TTL window.
|
||||
- **Risk: `n` parameter clamping vs. epic spec** — the epic fixes K=10, N=3; the Protocol leaves `n` parametric for testability. *Mitigation*: composition root binds `n=3` from `config.rerank.top_n` (default 3); the Protocol accepts arbitrary `n` so tests can use smaller values.
|
||||
- **Risk: drop-and-continue can mask a backbone-wide regression** — if every flight has 3/10 candidates failing silently, recall degrades without any single failure being investigated. *Mitigation*: `RerankResult.candidates_dropped` is published per-frame; an FDR aggregate alert (post-flight tooling) flags flights with `candidates_dropped` p95 > 1.
|
||||
Reference in New Issue
Block a user