Closes out greenfield Step 6 (Decompose) for all 14 components (C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446 plus the _dependencies_table.md and component contract documents. State file updated to greenfield Step 7 (Implement), not_started. Co-authored-by: Cursor <cursoragent@cursor.com>
18 KiB
C3 DISK+LightGlue Primary Matcher
Task: AZ-345_c3_disk_lightglue
Name: C3 DISK+LightGlue Primary Matcher
Description: Implement DiskLightGlueMatcher, the production-default CrossDomainMatcher (per D-C3-1 = (a)). For each top-N=3 candidate in a RerankResult: extract DISK keypoints + descriptors from the nav-camera frame and the candidate tile via the C7 InferenceRuntime (TensorRT 10.3 FP16 primary, ONNX-Runtime fallback); match keypoints via the shared LightGlueRuntime helper (AZ-278); filter inliers + compute median reprojection residual via the shared RansacFilter helper (AZ-282); record the result in a CandidateMatchSet. Sort surviving candidates descending by inlier count (tie-break: lower median residual ranked higher); return the best as MatchResult.best_candidate_idx. Implements the drop-and-continue contract (Invariant 4) for per-candidate MatcherBackboneError. Updates the constructor-injected RollingHealthWindow after each frame. Composition-root wired via the AZ-344 factory.
Complexity: 5 points
Dependencies: AZ-344 (Protocol + factory + DTOs + errors + RollingHealthWindow), AZ-263_initial_structure, AZ-269_config_loader, AZ-278_lightglue_runtime (shared LightGlue helper), AZ-282_ransac_filter (shared RANSAC helper), AZ-298_c7_tensorrt_runtime (DISK forward via TRT), AZ-299_c7_onnxrt_fallback (DISK forward via ONNX-RT fallback), AZ-303_c6_storage_interfaces (tile_pixels_handle from RerankResult; tile pixel decode), AZ-281_engine_filename_schema (DISK engine self-describing filename), AZ-321_c10_engine_compiler (DISK + LightGlue engine compile path), AZ-266_log_module, AZ-272_fdr_record_schema
Component: c3_matcher (epic AZ-257 / E-C3)
Tracker: AZ-345
Epic: AZ-257 (E-C3)
Document Dependencies
_docs/02_document/contracts/c3_matcher/cross_domain_matcher_protocol.md— Protocol contract (every invariant satisfied; drop-and-continue is INV-4)._docs/02_document/components/04_c3_matcher/description.md— § 1 D-C3-1 = (a) production-default; § 5 error handling; § 7 shared helper serial access; § 9 logging._docs/02_document/module-layout.md—c3_matcherPer-Component Mapping (disk_lightglue.pyInternal);BUILD_MATCHER_DISK_LIGHTGLUErow (ON for airborne / research / replay-cli)._docs/02_document/contracts/shared_helpers/lightglue_runtime.md— single-pair / multi-pair API._docs/02_document/contracts/shared_helpers/ransac_filter.md— RANSAC + median residual API._docs/02_document/contracts/c2_5_rerank/rerank_strategy_protocol.md—RerankResultconsumed at input boundary._docs/02_document/contracts/c7_inference/inference_runtime_protocol.md— DISK forward viaInferenceRuntime._docs/02_document/components/04_c3_matcher/tests.md— C3-IT-01 (best-candidate inlier count p5 ≥ 80); C3-IT-02 (deterministic best_candidate_idx); C3-IT-03 (cross-domain MRE p95 < 2.5 px); C3-IT-04 (tilt ±20° + 350m outliers); C3-IT-05 (InsufficientInliersErrorpropagation); C3-PT-01 (latency p95 ≤ 180 ms; per-candidate ≤ 60 ms; GPU mem ≤ 800 MB).
Problem
Without this task: compose_root cannot wire when config.matcher.strategy = "disk_lightglue" (the default value); F3 / F6 cannot run; AC-1.1 (best-candidate inlier count p5 ≥ 80) has no producer; AC-2.2 (cross-domain MRE p95 < 2.5 px) is unmeasurable; AC-NEW-7 cache-poisoning safety budget loses its primary detection signal (low-inlier frames in MatcherHealth). The DISK+LightGlue choice is locked per Mode B Fact #110 / D-C3-1; without this task the locked decision is unrealised.
Outcome
src/gps_denied_onboard/components/c3_matcher/disk_lightglue.pydefining:DiskLightGlueMatcherclass implementing theCrossDomainMatcherProtocol (AZ-344).- Constructor:
__init__(self, runtime: InferenceRuntime, lightglue_runtime: LightGlueRuntime, ransac_filter: RansacFilter, fdr_client: FdrClient, health_window: RollingHealthWindow, config: MatcherConfig). The strategy holds the DISK engine ID (returned byruntime.load_engine) plus references to the constructor-injectedLightGlueRuntime+RansacFilter. match(frame, rerank_result, calibration):- Decode + preprocess the nav-camera frame ONCE (resize / normalise per DISK's input contract).
- Run DISK forward on the query frame →
(query_keypoints, query_descriptors). survivors: list[CandidateMatchSet] = [],dropped = 0.- For each
RerankCandidateinrerank_result.candidates: a. Decode + preprocess the candidate tile (fromtile_pixels_handle). b. Try DISK forward on the tile →(tile_keypoints, tile_descriptors). On failure: wrap asMatcherBackboneError; emit ERROR log + FDR recordkind="matcher.backbone_error"withtile_id+phase="disk_forward";dropped += 1; continue. c. Trylightglue_runtime.match_pair(query_keypoints, query_descriptors, tile_keypoints, tile_descriptors)→correspondences(raw matches before RANSAC). On failure: wrap asMatcherBackboneError; phase="lightglue_match"; drop; continue. d.ransac_result = ransac_filter.filter(correspondences, threshold_px=config.ransac_threshold_px)→RansacResult(inlier_correspondences, ransac_outlier_count, per_candidate_residual_px). The helper handles RANSAC + median residual computation. e. Ifransac_result.inlier_correspondences.shape[0] == 0: emit DEBUG logkind="c3.matcher.zero_inliers";dropped += 1; continue. f. AppendCandidateMatchSet(tile_id=candidate.tile_id, inlier_count=ransac_result.inlier_correspondences.shape[0], inlier_correspondences=ransac_result.inlier_correspondences, ransac_outlier_count=ransac_result.ransac_outlier_count, per_candidate_residual_px=ransac_result.per_candidate_residual_px)tosurvivors. - Determine
survivor_max_inliers = max(s.inlier_count for s in survivors)(or 0 if empty). - If
len(survivors) == 0ORsurvivor_max_inliers < config.min_inliers_threshold: emit ERROR logkind="c3.matcher.insufficient_inliers"+ FDR recordkind="matcher.insufficient_inliers";health_window.update(now, best_inlier_count=0, had_backbone_error=(dropped > 0)); raiseInsufficientInliersError. - Sort
survivorsdescending byinlier_count; ties broken byper_candidate_residual_pxascending. The first survivor is the best. best = survivors[0]. Ifbest.per_candidate_residual_px > config.residual_warn_threshold_px: emit WARN logkind="c3.matcher.residual_above_threshold"(will trigger AdHoP at C3.5).health_window.update(now, best_inlier_count=best.inlier_count, had_backbone_error=(dropped > 0)).- Emit FDR record
kind="matcher.frame_done"with{frame_id, candidates_input, candidates_dropped, best_inlier_count, best_residual_px, best_tile_id}. - Return
MatchResult(frame_id=rerank_result.frame_id, per_candidate=survivors, best_candidate_idx=0, reprojection_residual_px=best.per_candidate_residual_px, matched_at=monotonic_ns(), matcher_label="disk_lightglue", candidates_input=len(rerank_result.candidates), candidates_dropped=dropped).
health_snapshot(): returnsself._health_window.snapshot().- Module-level
create(config, lightglue_runtime, ransac_filter, inference_runtime, health_window) -> CrossDomainMatcher:disk_weights_path = config.matcher.disk_weights_path(TRT engine produced by AZ-321).- Load DISK engine via
inference_runtime.load_engine(disk_weights_path). - Construct
DiskLightGlueMatcher(...).
- Composition-root wiring path for
config.matcher.strategy == "disk_lightglue". - Logging per description.md § 9: INFO ready; WARN residual-above-threshold; ERROR insufficient-inliers + backbone-error; DEBUG per-frame inlier+residual list (gated).
- FDR records:
matcher.frame_done(always per frame),matcher.backbone_error(per error),matcher.insufficient_inliers(per all-failed event).
Scope
Included
DiskLightGlueMatcherclass implementingCrossDomainMatcherexactly per the AZ-344 contract.- DISK forward via C7
InferenceRuntime(TRT primary; ONNX-RT fallback chain owned by C7 — this task consumes the unified interface). - LightGlue matching via shared helper.
- RANSAC + median residual via shared
RansacFilterhelper. - Drop-and-continue per-candidate error handling (Invariant 4).
- Below-threshold all-failed →
InsufficientInliersError. - Deterministic best-candidate selection (Invariant 3).
RollingHealthWindow.updateafter each frame.- Composition-root wiring path.
- Logging + FDR record emission per description.md § 9.
- Unit tests covering Invariants 1–9, drop-and-continue, below-threshold, deterministic ordering,
tile_pixels_handlereference semantics, composition-root wiring path. BUILD_MATCHER_DISK_LIGHTGLUEflag wiring (ON in airborne / research / replay-cli; OFF in operator-tooling).
Excluded
- The Protocol + DTOs + errors + factory +
RollingHealthWindow— owned by AZ-344. - The
LightGlueRuntimehelper — already AZ-278. - The
RansacFilterhelper — already AZ-282. - The C7
InferenceRuntime— owned by AZ-297..AZ-300. - DISK engine compile (.onnx → .trt) — owned by AZ-321; this task consumes the produced engine.
- ALIKED+LightGlue (AZ-346) and XFeat (AZ-347).
- Component-internal acceptance tests beyond Invariants 1–9 + drop-and-continue smoke: C3-IT-01 (recall floor), C3-IT-03 (cross-domain MRE), C3-IT-04 (tilt outliers), C3-PT-01 (latency NFR), are deferred to Step 9 / E-BBT.
Acceptance Criteria
AC-1: Protocol conformance
isinstance(DiskLightGlueMatcher(...), CrossDomainMatcher) returns True.
AC-2: Best-candidate selection — argmax(inlier_count) + tie-break
Given a RerankResult with N=3 candidates whose computed inlier counts are [120, 80, 120] and median residuals [1.4, 1.0, 1.1]
When match(...) is called
Then best_candidate_idx == 0 (the candidate with inlier_count=120 AND residual=1.1 (lower than the other 120-inlier candidate's 1.4)); per_candidate[0].inlier_count == 120 AND per_candidate_residual_px == 1.1; per_candidate[1].inlier_count == 120 AND per_candidate_residual_px == 1.4; per_candidate[2].inlier_count == 80.
AC-3: Drop-and-continue on per-candidate MatcherBackboneError
Given an InferenceRuntime test double that raises RuntimeError on the 2nd candidate's DISK forward and succeeds on others
When match(...) is called
Then len(per_candidate) == 2; candidates_dropped == 1; ONE ERROR log kind="c3.matcher.backbone_error" is emitted with tile_id + phase="disk_forward"; ONE FDR record kind="matcher.backbone_error" is emitted; success path continues.
AC-4: Drop-and-continue on per-candidate LightGlue failure
Given a LightGlueRuntime test double that raises on the 1st candidate's match call
When match(...) is called
Then the candidate is dropped with phase="lightglue_match"; ERROR log + FDR record emitted; remaining candidates processed.
AC-5: Below-threshold → InsufficientInliersError
Given config.matcher.min_inliers_threshold = 60 AND every candidate's RANSAC inlier count is < 60
When match(...) is called
Then InsufficientInliersError is raised; ONE ERROR log kind="c3.matcher.insufficient_inliers" + ONE FDR record kind="matcher.insufficient_inliers" are emitted; health_window.update(now, best_inlier_count=0, had_backbone_error=False) is invoked.
AC-6: All-failed → InsufficientInliersError
Given every candidate's DISK forward raises
When match(...) is called
Then InsufficientInliersError is raised; per-candidate ERROR logs + final ERROR log emitted; health_window.update(now, best_inlier_count=0, had_backbone_error=True) is invoked.
AC-7: WARN log on residual above threshold
Given the best candidate's per_candidate_residual_px = 4.2 AND config.matcher.residual_warn_threshold_px = 2.5
When match(...) returns
Then ONE WARN log kind="c3.matcher.residual_above_threshold" with {residual_px: 4.2, threshold_px: 2.5} is emitted.
AC-8: health_window.update invoked after every match (success or failure)
Given any match(...) call (success, partial drop, all-failed)
When the call completes (returns normally OR raises InsufficientInliersError)
Then health_window.update(...) is invoked exactly ONCE for that frame; best_inlier_count matches the actual best inlier count (0 on all-failed); had_backbone_error == True if any candidate dropped due to backbone failure.
AC-9: inlier_correspondences shape contract
Given a successful match(...)
When inspecting any CandidateMatchSet
Then inlier_correspondences.shape == (inlier_count, 4); dtype == float32.
AC-10: Deterministic — same inputs → bit-identical MatchResult
Given fixed inputs and deterministic test doubles
When match(...) is called 3 times
Then all three returns have identical per_candidate content (same inlier_counts, same residuals, same best_candidate_idx).
AC-11: Composition-root wiring
Given config.matcher.strategy = "disk_lightglue" AND a constructed shared LightGlueRuntime AND RansacFilter AND InferenceRuntime
When compose_root(config) runs
Then a DiskLightGlueMatcher instance is wired; ONE INFO log kind="c3.matcher.ready" with {strategy: "disk_lightglue", min_inliers_threshold, residual_warn_threshold_px} is emitted; the strategy's _lightglue_runtime is identity-equal to the runtime root's shared helper.
AC-12: FDR matcher.frame_done per frame
Given a successful match(...) returning best candidate with inlier_count=120 and residual=1.1, dropped=1
When the call completes
Then ONE FDR record kind="matcher.frame_done" is emitted with structured fields {frame_id, candidates_input: 3, candidates_dropped: 1, best_inlier_count: 120, best_residual_px: 1.1, best_tile_id: <tuple>}.
Non-Functional Requirements
Performance (deferred validation to C3-PT-01):
matchp95 ≤ 180 ms (3 candidates × ~60 ms DISK forward + LightGlue match + RANSAC).- Per-candidate p95 ≤ 60 ms.
- GPU memory ≤ 800 MB combined (DISK engine + LightGlue engine resident).
Compatibility
- DISK engine file format owned by C10 + C7; this task consumes via
config.matcher.disk_weights_path. - Upstream DISK research code drop pinned per Plan-phase; weight changes require C10 rebuild + C3-IT-03 re-run.
Reliability
- Drop-and-continue per candidate (Invariant 4).
- Single-thread by contract (INV-1).
InsufficientInliersErrortriggers C5 VIO-only fallback (AC-3.5); does NOT crash.
Unit Tests
| AC Ref | What to Test | Required Outcome |
|---|---|---|
| AC-1 | Protocol conformance | isinstance returns True |
| AC-2 | Best-candidate + tie-break | Lower residual wins among tied inliers |
| AC-3 | DISK forward fails on 2nd | 2 survivors; ERROR log + FDR record |
| AC-4 | LightGlue fails on 1st | 2 survivors; phase="lightglue_match" |
| AC-5 | All below threshold | InsufficientInliersError; health update |
| AC-6 | All forwards fail | InsufficientInliersError; per-candidate logs |
| AC-7 | Residual > warn threshold | WARN log emitted |
| AC-8 | Health update invoked once per match |
One update per call regardless of outcome |
| AC-9 | Correspondences shape | (I, 4) float32; I == inlier_count |
| AC-10 | Determinism | 3 calls return identical content |
| AC-11 | compose_root wiring |
Wired; INFO log; helper identity-shared |
| AC-12 | FDR frame_done emission |
Correct structured fields |
Constraints
- Drop-and-continue is mandatory — Invariant 4; per-candidate exceptions never propagate.
- Median residual, not mean — Invariant 8; computed inside
RansacFilter. - Constructor injection only — no
import gps_denied_onboard.configinside the strategy module. LightGlueRuntimeandRansacFilterare constructor-injected — never instantiated here.- DISK engine load at
createtime, NOT at first frame — engine-output assertion fires at startup. - Tile pixel decode is per-call — but the underlying
tile_pixels_handleis page-cache-backed (not copied into the strategy). RollingHealthWindow.updateis called EXACTLY once permatch— including the all-failed path.
Risks & Mitigation
Risk 1: DISK upstream code drop ships an unsupported ONNX op for TRT 10.3
- Mitigation: engine compile is C10's responsibility (AZ-321). If C10 cannot build the engine, this task is blocked upstream — surface via tracker dependency mechanism.
Risk 2: LightGlueRuntime.match_pair API not yet defined
- Mitigation: AZ-278 defines the helper API; this task consumes whatever AZ-278 ships. If only single-pair is provided, this task wraps single-pair calls in a per-candidate loop (already structured that way). Surface to AZ-278 implementer at decompose-step-4.
Risk 3: Tile pixel decode is non-trivial cost on hot path
- Mitigation: tile pixels arrive as page-cache-backed handles from C6; decode (JPEG → ndarray) happens once per candidate. If profiling shows this is a bottleneck, a future optimization pre-decodes adjacent tiles in C6's mmap layer.
Risk 4: Deterministic best-candidate tie-break depends on stable sort
- Mitigation: Python's
list.sort()is stable; the implementation usessorted(survivors, key=lambda s: (-s.inlier_count, s.per_candidate_residual_px))which is deterministic. Test AC-2 asserts the exact ordering on a tie scenario.
Risk 5: RollingHealthWindow drift between matcher implementations
- Mitigation: ONE
RollingHealthWindowclass owned by AZ-344; constructor-injected into every concrete matcher. AZ-345/AZ-346/AZ-347 use the same instance type via the same constructor injection.
Runtime Completeness
- Named capability:
DiskLightGlueMatcher— production-defaultCrossDomainMatcherfor cross-domain feature matching (architecture / E-C3 /solution.md/ D-C3-1 / AC-1.1 + AC-2.2 + AC-3.1). - Production code that must exist: real
DiskLightGlueMatchercalling real C7InferenceRuntimewith real TRT-compiled DISK engine; real sharedLightGlueRuntimecalls; real sharedRansacFilterfor inlier filtering + median residual; realRollingHealthWindow.updateafter each frame; real composition-root wiring. - Allowed external stubs:
FakeInferenceRuntime,FakeLightGlueRuntime,FakeRansacFilter,FakeFdrClient, synthetic frame fixtures for unit tests. - Unacceptable substitutes: a Python+NumPy implementation of DISK forward (would not satisfy C3-PT-01 latency); a different RANSAC implementation per matcher (would defeat AZ-282 helper); skipping
RollingHealthWindow.updateon the all-failed path (would lose the health signal C5 needs); callingLightGlueRuntimein batch mode without per-candidate inlier breakdown; using the mean residual instead of the median (would violate INV-8).