# C3 DISK+LightGlue Primary Matcher **Task**: AZ-345_c3_disk_lightglue **Name**: C3 DISK+LightGlue Primary Matcher **Description**: Implement `DiskLightGlueMatcher`, the production-default `CrossDomainMatcher` (per D-C3-1 = (a)). For each top-N=3 candidate in a `RerankResult`: extract DISK keypoints + descriptors from the nav-camera frame and the candidate tile via the C7 `InferenceRuntime` (TensorRT 10.3 FP16 primary, ONNX-Runtime fallback); match keypoints via the shared `LightGlueRuntime` helper (AZ-278); filter inliers + compute median reprojection residual via the shared `RansacFilter` helper (AZ-282); record the result in a `CandidateMatchSet`. Sort surviving candidates descending by inlier count (tie-break: lower median residual ranked higher); return the best as `MatchResult.best_candidate_idx`. Implements the drop-and-continue contract (Invariant 4) for per-candidate `MatcherBackboneError`. Updates the constructor-injected `RollingHealthWindow` after each frame. Composition-root wired via the AZ-344 factory. **Complexity**: 5 points **Dependencies**: AZ-344 (Protocol + factory + DTOs + errors + RollingHealthWindow), AZ-263_initial_structure, AZ-269_config_loader, AZ-278_lightglue_runtime (shared LightGlue helper), AZ-282_ransac_filter (shared RANSAC helper), AZ-298_c7_tensorrt_runtime (DISK forward via TRT), AZ-299_c7_onnxrt_fallback (DISK forward via ONNX-RT fallback), AZ-303_c6_storage_interfaces (`tile_pixels_handle` from `RerankResult`; tile pixel decode), AZ-281_engine_filename_schema (DISK engine self-describing filename), AZ-321_c10_engine_compiler (DISK + LightGlue engine compile path), AZ-266_log_module, AZ-272_fdr_record_schema **Component**: c3_matcher (epic AZ-257 / E-C3) **Tracker**: AZ-345 **Epic**: AZ-257 (E-C3) ### Document Dependencies - `_docs/02_document/contracts/c3_matcher/cross_domain_matcher_protocol.md` — Protocol contract (every invariant satisfied; drop-and-continue is INV-4). - `_docs/02_document/components/04_c3_matcher/description.md` — § 1 D-C3-1 = (a) production-default; § 5 error handling; § 7 shared helper serial access; § 9 logging. - `_docs/02_document/module-layout.md` — `c3_matcher` Per-Component Mapping (`disk_lightglue.py` Internal); `BUILD_MATCHER_DISK_LIGHTGLUE` row (ON for airborne / research / replay-cli). - `_docs/02_document/contracts/shared_helpers/lightglue_runtime.md` — single-pair / multi-pair API. - `_docs/02_document/contracts/shared_helpers/ransac_filter.md` — RANSAC + median residual API. - `_docs/02_document/contracts/c2_5_rerank/rerank_strategy_protocol.md` — `RerankResult` consumed at input boundary. - `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md` — DISK forward via `InferenceRuntime`. - `_docs/02_document/components/04_c3_matcher/tests.md` — C3-IT-01 (best-candidate inlier count p5 ≥ 80); C3-IT-02 (deterministic best_candidate_idx); C3-IT-03 (cross-domain MRE p95 < 2.5 px); C3-IT-04 (tilt ±20° + 350m outliers); C3-IT-05 (`InsufficientInliersError` propagation); C3-PT-01 (latency p95 ≤ 180 ms; per-candidate ≤ 60 ms; GPU mem ≤ 800 MB). ## Problem Without this task: `compose_root` cannot wire when `config.matcher.strategy = "disk_lightglue"` (the default value); F3 / F6 cannot run; AC-1.1 (best-candidate inlier count p5 ≥ 80) has no producer; AC-2.2 (cross-domain MRE p95 < 2.5 px) is unmeasurable; AC-NEW-7 cache-poisoning safety budget loses its primary detection signal (low-inlier frames in MatcherHealth). The DISK+LightGlue choice is locked per Mode B Fact #110 / D-C3-1; without this task the locked decision is unrealised. ## Outcome - `src/gps_denied_onboard/components/c3_matcher/disk_lightglue.py` defining: - `DiskLightGlueMatcher` class implementing the `CrossDomainMatcher` Protocol (AZ-344). - Constructor: `__init__(self, runtime: InferenceRuntime, lightglue_runtime: LightGlueRuntime, ransac_filter: RansacFilter, fdr_client: FdrClient, health_window: RollingHealthWindow, config: MatcherConfig)`. The strategy holds the DISK engine ID (returned by `runtime.load_engine`) plus references to the constructor-injected `LightGlueRuntime` + `RansacFilter`. - `match(frame, rerank_result, calibration)`: 1. Decode + preprocess the nav-camera frame ONCE (resize / normalise per DISK's input contract). 2. Run DISK forward on the query frame → `(query_keypoints, query_descriptors)`. 3. `survivors: list[CandidateMatchSet] = []`, `dropped = 0`. 4. For each `RerankCandidate` in `rerank_result.candidates`: a. Decode + preprocess the candidate tile (from `tile_pixels_handle`). b. Try DISK forward on the tile → `(tile_keypoints, tile_descriptors)`. On failure: wrap as `MatcherBackboneError`; emit ERROR log + FDR record `kind="matcher.backbone_error"` with `tile_id` + `phase="disk_forward"`; `dropped += 1`; continue. c. Try `lightglue_runtime.match_pair(query_keypoints, query_descriptors, tile_keypoints, tile_descriptors)` → `correspondences` (raw matches before RANSAC). On failure: wrap as `MatcherBackboneError`; phase="lightglue_match"; drop; continue. d. `ransac_result = ransac_filter.filter(correspondences, threshold_px=config.ransac_threshold_px)` → `RansacResult(inlier_correspondences, ransac_outlier_count, per_candidate_residual_px)`. The helper handles RANSAC + median residual computation. e. If `ransac_result.inlier_correspondences.shape[0] == 0`: emit DEBUG log `kind="c3.matcher.zero_inliers"`; `dropped += 1`; continue. f. Append `CandidateMatchSet(tile_id=candidate.tile_id, inlier_count=ransac_result.inlier_correspondences.shape[0], inlier_correspondences=ransac_result.inlier_correspondences, ransac_outlier_count=ransac_result.ransac_outlier_count, per_candidate_residual_px=ransac_result.per_candidate_residual_px)` to `survivors`. 5. Determine `survivor_max_inliers = max(s.inlier_count for s in survivors)` (or 0 if empty). 6. If `len(survivors) == 0` OR `survivor_max_inliers < config.min_inliers_threshold`: emit ERROR log `kind="c3.matcher.insufficient_inliers"` + FDR record `kind="matcher.insufficient_inliers"`; `health_window.update(now, best_inlier_count=0, had_backbone_error=(dropped > 0))`; raise `InsufficientInliersError`. 7. Sort `survivors` descending by `inlier_count`; ties broken by `per_candidate_residual_px` ascending. The first survivor is the best. 8. `best = survivors[0]`. If `best.per_candidate_residual_px > config.residual_warn_threshold_px`: emit WARN log `kind="c3.matcher.residual_above_threshold"` (will trigger AdHoP at C3.5). 9. `health_window.update(now, best_inlier_count=best.inlier_count, had_backbone_error=(dropped > 0))`. 10. Emit FDR record `kind="matcher.frame_done"` with `{frame_id, candidates_input, candidates_dropped, best_inlier_count, best_residual_px, best_tile_id}`. 11. Return `MatchResult(frame_id=rerank_result.frame_id, per_candidate=survivors, best_candidate_idx=0, reprojection_residual_px=best.per_candidate_residual_px, matched_at=monotonic_ns(), matcher_label="disk_lightglue", candidates_input=len(rerank_result.candidates), candidates_dropped=dropped)`. - `health_snapshot()`: returns `self._health_window.snapshot()`. - Module-level `create(config, lightglue_runtime, ransac_filter, inference_runtime, health_window) -> CrossDomainMatcher`: 1. `disk_weights_path = config.matcher.disk_weights_path` (TRT engine produced by AZ-321). 2. Load DISK engine via `inference_runtime.load_engine(disk_weights_path)`. 3. Construct `DiskLightGlueMatcher(...)`. - Composition-root wiring path for `config.matcher.strategy == "disk_lightglue"`. - Logging per description.md § 9: INFO ready; WARN residual-above-threshold; ERROR insufficient-inliers + backbone-error; DEBUG per-frame inlier+residual list (gated). - FDR records: `matcher.frame_done` (always per frame), `matcher.backbone_error` (per error), `matcher.insufficient_inliers` (per all-failed event). ## Scope ### Included - `DiskLightGlueMatcher` class implementing `CrossDomainMatcher` exactly per the AZ-344 contract. - DISK forward via C7 `InferenceRuntime` (TRT primary; ONNX-RT fallback chain owned by C7 — this task consumes the unified interface). - LightGlue matching via shared helper. - RANSAC + median residual via shared `RansacFilter` helper. - Drop-and-continue per-candidate error handling (Invariant 4). - Below-threshold all-failed → `InsufficientInliersError`. - Deterministic best-candidate selection (Invariant 3). - `RollingHealthWindow.update` after each frame. - Composition-root wiring path. - Logging + FDR record emission per description.md § 9. - Unit tests covering Invariants 1–9, drop-and-continue, below-threshold, deterministic ordering, `tile_pixels_handle` reference semantics, composition-root wiring path. - `BUILD_MATCHER_DISK_LIGHTGLUE` flag wiring (ON in airborne / research / replay-cli; OFF in operator-tooling). ### Excluded - The Protocol + DTOs + errors + factory + `RollingHealthWindow` — owned by AZ-344. - The `LightGlueRuntime` helper — already AZ-278. - The `RansacFilter` helper — already AZ-282. - The C7 `InferenceRuntime` — owned by AZ-297..AZ-300. - DISK engine compile (.onnx → .trt) — owned by AZ-321; this task consumes the produced engine. - ALIKED+LightGlue (AZ-346) and XFeat (AZ-347). - Component-internal acceptance tests beyond Invariants 1–9 + drop-and-continue smoke: C3-IT-01 (recall floor), C3-IT-03 (cross-domain MRE), C3-IT-04 (tilt outliers), C3-PT-01 (latency NFR), are deferred to Step 9 / E-BBT. ## Acceptance Criteria **AC-1: Protocol conformance** `isinstance(DiskLightGlueMatcher(...), CrossDomainMatcher)` returns `True`. **AC-2: Best-candidate selection — argmax(inlier_count) + tie-break** Given a `RerankResult` with N=3 candidates whose computed inlier counts are [120, 80, 120] and median residuals [1.4, 1.0, 1.1] When `match(...)` is called Then `best_candidate_idx == 0` (the candidate with `inlier_count=120` AND `residual=1.1` (lower than the other 120-inlier candidate's 1.4)); `per_candidate[0].inlier_count == 120 AND per_candidate_residual_px == 1.1`; `per_candidate[1].inlier_count == 120 AND per_candidate_residual_px == 1.4`; `per_candidate[2].inlier_count == 80`. **AC-3: Drop-and-continue on per-candidate `MatcherBackboneError`** Given an `InferenceRuntime` test double that raises `RuntimeError` on the 2nd candidate's DISK forward and succeeds on others When `match(...)` is called Then `len(per_candidate) == 2`; `candidates_dropped == 1`; ONE ERROR log `kind="c3.matcher.backbone_error"` is emitted with `tile_id` + `phase="disk_forward"`; ONE FDR record `kind="matcher.backbone_error"` is emitted; success path continues. **AC-4: Drop-and-continue on per-candidate LightGlue failure** Given a `LightGlueRuntime` test double that raises on the 1st candidate's match call When `match(...)` is called Then the candidate is dropped with `phase="lightglue_match"`; ERROR log + FDR record emitted; remaining candidates processed. **AC-5: Below-threshold → `InsufficientInliersError`** Given `config.matcher.min_inliers_threshold = 60` AND every candidate's RANSAC inlier count is < 60 When `match(...)` is called Then `InsufficientInliersError` is raised; ONE ERROR log `kind="c3.matcher.insufficient_inliers"` + ONE FDR record `kind="matcher.insufficient_inliers"` are emitted; `health_window.update(now, best_inlier_count=0, had_backbone_error=False)` is invoked. **AC-6: All-failed → `InsufficientInliersError`** Given every candidate's DISK forward raises When `match(...)` is called Then `InsufficientInliersError` is raised; per-candidate ERROR logs + final ERROR log emitted; `health_window.update(now, best_inlier_count=0, had_backbone_error=True)` is invoked. **AC-7: WARN log on residual above threshold** Given the best candidate's `per_candidate_residual_px = 4.2` AND `config.matcher.residual_warn_threshold_px = 2.5` When `match(...)` returns Then ONE WARN log `kind="c3.matcher.residual_above_threshold"` with `{residual_px: 4.2, threshold_px: 2.5}` is emitted. **AC-8: `health_window.update` invoked after every `match` (success or failure)** Given any `match(...)` call (success, partial drop, all-failed) When the call completes (returns normally OR raises `InsufficientInliersError`) Then `health_window.update(...)` is invoked exactly ONCE for that frame; `best_inlier_count` matches the actual best inlier count (0 on all-failed); `had_backbone_error == True` if any candidate dropped due to backbone failure. **AC-9: `inlier_correspondences` shape contract** Given a successful `match(...)` When inspecting any `CandidateMatchSet` Then `inlier_correspondences.shape == (inlier_count, 4)`; `dtype == float32`. **AC-10: Deterministic — same inputs → bit-identical MatchResult** Given fixed inputs and deterministic test doubles When `match(...)` is called 3 times Then all three returns have identical `per_candidate` content (same inlier_counts, same residuals, same best_candidate_idx). **AC-11: Composition-root wiring** Given `config.matcher.strategy = "disk_lightglue"` AND a constructed shared `LightGlueRuntime` AND `RansacFilter` AND `InferenceRuntime` When `compose_root(config)` runs Then a `DiskLightGlueMatcher` instance is wired; ONE INFO log `kind="c3.matcher.ready"` with `{strategy: "disk_lightglue", min_inliers_threshold, residual_warn_threshold_px}` is emitted; the strategy's `_lightglue_runtime` is identity-equal to the runtime root's shared helper. **AC-12: FDR `matcher.frame_done` per frame** Given a successful `match(...)` returning best candidate with inlier_count=120 and residual=1.1, dropped=1 When the call completes Then ONE FDR record `kind="matcher.frame_done"` is emitted with structured fields `{frame_id, candidates_input: 3, candidates_dropped: 1, best_inlier_count: 120, best_residual_px: 1.1, best_tile_id: }`. ## Non-Functional Requirements **Performance** (deferred validation to C3-PT-01): - `match` p95 ≤ 180 ms (3 candidates × ~60 ms DISK forward + LightGlue match + RANSAC). - Per-candidate p95 ≤ 60 ms. - GPU memory ≤ 800 MB combined (DISK engine + LightGlue engine resident). **Compatibility** - DISK engine file format owned by C10 + C7; this task consumes via `config.matcher.disk_weights_path`. - Upstream DISK research code drop pinned per Plan-phase; weight changes require C10 rebuild + C3-IT-03 re-run. **Reliability** - Drop-and-continue per candidate (Invariant 4). - Single-thread by contract (INV-1). - `InsufficientInliersError` triggers C5 VIO-only fallback (AC-3.5); does NOT crash. ## Unit Tests | AC Ref | What to Test | Required Outcome | |--------|--------------|------------------| | AC-1 | Protocol conformance | `isinstance` returns `True` | | AC-2 | Best-candidate + tie-break | Lower residual wins among tied inliers | | AC-3 | DISK forward fails on 2nd | 2 survivors; ERROR log + FDR record | | AC-4 | LightGlue fails on 1st | 2 survivors; phase="lightglue_match" | | AC-5 | All below threshold | `InsufficientInliersError`; health update | | AC-6 | All forwards fail | `InsufficientInliersError`; per-candidate logs | | AC-7 | Residual > warn threshold | WARN log emitted | | AC-8 | Health update invoked once per `match` | One update per call regardless of outcome | | AC-9 | Correspondences shape | (I, 4) float32; I == inlier_count | | AC-10 | Determinism | 3 calls return identical content | | AC-11 | `compose_root` wiring | Wired; INFO log; helper identity-shared | | AC-12 | FDR `frame_done` emission | Correct structured fields | ## Constraints - **Drop-and-continue is mandatory** — Invariant 4; per-candidate exceptions never propagate. - **Median residual, not mean** — Invariant 8; computed inside `RansacFilter`. - **Constructor injection only** — no `import gps_denied_onboard.config` inside the strategy module. - **`LightGlueRuntime` and `RansacFilter` are constructor-injected** — never instantiated here. - **DISK engine load at `create` time, NOT at first frame** — engine-output assertion fires at startup. - **Tile pixel decode is per-call** — but the underlying `tile_pixels_handle` is page-cache-backed (not copied into the strategy). - **`RollingHealthWindow.update` is called EXACTLY once per `match`** — including the all-failed path. ## Risks & Mitigation **Risk 1: DISK upstream code drop ships an unsupported ONNX op for TRT 10.3** - *Mitigation*: engine compile is C10's responsibility (AZ-321). If C10 cannot build the engine, this task is blocked upstream — surface via tracker dependency mechanism. **Risk 2: `LightGlueRuntime.match_pair` API not yet defined** - *Mitigation*: AZ-278 defines the helper API; this task consumes whatever AZ-278 ships. If only single-pair is provided, this task wraps single-pair calls in a per-candidate loop (already structured that way). Surface to AZ-278 implementer at decompose-step-4. **Risk 3: Tile pixel decode is non-trivial cost on hot path** - *Mitigation*: tile pixels arrive as page-cache-backed handles from C6; decode (JPEG → ndarray) happens once per candidate. If profiling shows this is a bottleneck, a future optimization pre-decodes adjacent tiles in C6's mmap layer. **Risk 4: Deterministic best-candidate tie-break depends on stable sort** - *Mitigation*: Python's `list.sort()` is stable; the implementation uses `sorted(survivors, key=lambda s: (-s.inlier_count, s.per_candidate_residual_px))` which is deterministic. Test AC-2 asserts the exact ordering on a tie scenario. **Risk 5: `RollingHealthWindow` drift between matcher implementations** - *Mitigation*: ONE `RollingHealthWindow` class owned by AZ-344; constructor-injected into every concrete matcher. AZ-345/AZ-346/AZ-347 use the same instance type via the same constructor injection. ## Runtime Completeness - **Named capability**: `DiskLightGlueMatcher` — production-default `CrossDomainMatcher` for cross-domain feature matching (architecture / E-C3 / `solution.md` / D-C3-1 / AC-1.1 + AC-2.2 + AC-3.1). - **Production code that must exist**: real `DiskLightGlueMatcher` calling real C7 `InferenceRuntime` with real TRT-compiled DISK engine; real shared `LightGlueRuntime` calls; real shared `RansacFilter` for inlier filtering + median residual; real `RollingHealthWindow.update` after each frame; real composition-root wiring. - **Allowed external stubs**: `FakeInferenceRuntime`, `FakeLightGlueRuntime`, `FakeRansacFilter`, `FakeFdrClient`, synthetic frame fixtures for unit tests. - **Unacceptable substitutes**: a Python+NumPy implementation of DISK forward (would not satisfy C3-PT-01 latency); a different RANSAC implementation per matcher (would defeat AZ-282 helper); skipping `RollingHealthWindow.update` on the all-failed path (would lose the health signal C5 needs); calling `LightGlueRuntime` in batch mode without per-candidate inlier breakdown; using the mean residual instead of the median (would violate INV-8).