mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 19:31:15 +00:00
[AZ-345] [AZ-346] [AZ-347] [AZ-349] Archive batch 57 task specs
Move completed task specs from _docs/02_tasks/todo/ to _docs/02_tasks/done/ now that the four tickets are In Testing. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,209 @@
|
||||
# C3 DISK+LightGlue Primary Matcher
|
||||
|
||||
**Task**: AZ-345_c3_disk_lightglue
|
||||
**Name**: C3 DISK+LightGlue Primary Matcher
|
||||
**Description**: Implement `DiskLightGlueMatcher`, the production-default `CrossDomainMatcher` (per D-C3-1 = (a)). For each top-N=3 candidate in a `RerankResult`: extract DISK keypoints + descriptors from the nav-camera frame and the candidate tile via the C7 `InferenceRuntime` (TensorRT 10.3 FP16 primary, ONNX-Runtime fallback); match keypoints via the shared `LightGlueRuntime` helper (AZ-278); filter inliers + compute median reprojection residual via the shared `RansacFilter` helper (AZ-282); record the result in a `CandidateMatchSet`. Sort surviving candidates descending by inlier count (tie-break: lower median residual ranked higher); return the best as `MatchResult.best_candidate_idx`. Implements the drop-and-continue contract (Invariant 4) for per-candidate `MatcherBackboneError`. Updates the constructor-injected `RollingHealthWindow` after each frame. Composition-root wired via the AZ-344 factory.
|
||||
**Complexity**: 5 points
|
||||
**Dependencies**: AZ-344 (Protocol + factory + DTOs + errors + RollingHealthWindow), AZ-263_initial_structure, AZ-269_config_loader, AZ-278_lightglue_runtime (shared LightGlue helper), AZ-282_ransac_filter (shared RANSAC helper), AZ-298_c7_tensorrt_runtime (DISK forward via TRT), AZ-299_c7_onnxrt_fallback (DISK forward via ONNX-RT fallback), AZ-303_c6_storage_interfaces (`tile_pixels_handle` from `RerankResult`; tile pixel decode), AZ-281_engine_filename_schema (DISK engine self-describing filename), AZ-321_c10_engine_compiler (DISK + LightGlue engine compile path), AZ-266_log_module, AZ-272_fdr_record_schema
|
||||
**Component**: c3_matcher (epic AZ-257 / E-C3)
|
||||
**Tracker**: AZ-345
|
||||
**Epic**: AZ-257 (E-C3)
|
||||
|
||||
### Document Dependencies
|
||||
|
||||
- `_docs/02_document/contracts/c3_matcher/cross_domain_matcher_protocol.md` — Protocol contract (every invariant satisfied; drop-and-continue is INV-4).
|
||||
- `_docs/02_document/components/04_c3_matcher/description.md` — § 1 D-C3-1 = (a) production-default; § 5 error handling; § 7 shared helper serial access; § 9 logging.
|
||||
- `_docs/02_document/module-layout.md` — `c3_matcher` Per-Component Mapping (`disk_lightglue.py` Internal); `BUILD_MATCHER_DISK_LIGHTGLUE` row (ON for airborne / research / replay-cli).
|
||||
- `_docs/02_document/contracts/shared_helpers/lightglue_runtime.md` — single-pair / multi-pair API.
|
||||
- `_docs/02_document/contracts/shared_helpers/ransac_filter.md` — RANSAC + median residual API.
|
||||
- `_docs/02_document/contracts/c2_5_rerank/rerank_strategy_protocol.md` — `RerankResult` consumed at input boundary.
|
||||
- `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md` — DISK forward via `InferenceRuntime`.
|
||||
- `_docs/02_document/components/04_c3_matcher/tests.md` — C3-IT-01 (best-candidate inlier count p5 ≥ 80); C3-IT-02 (deterministic best_candidate_idx); C3-IT-03 (cross-domain MRE p95 < 2.5 px); C3-IT-04 (tilt ±20° + 350m outliers); C3-IT-05 (`InsufficientInliersError` propagation); C3-PT-01 (latency p95 ≤ 180 ms; per-candidate ≤ 60 ms; GPU mem ≤ 800 MB).
|
||||
|
||||
## Problem
|
||||
|
||||
Without this task: `compose_root` cannot wire when `config.matcher.strategy = "disk_lightglue"` (the default value); F3 / F6 cannot run; AC-1.1 (best-candidate inlier count p5 ≥ 80) has no producer; AC-2.2 (cross-domain MRE p95 < 2.5 px) is unmeasurable; AC-NEW-7 cache-poisoning safety budget loses its primary detection signal (low-inlier frames in MatcherHealth). The DISK+LightGlue choice is locked per Mode B Fact #110 / D-C3-1; without this task the locked decision is unrealised.
|
||||
|
||||
## Outcome
|
||||
|
||||
- `src/gps_denied_onboard/components/c3_matcher/disk_lightglue.py` defining:
|
||||
- `DiskLightGlueMatcher` class implementing the `CrossDomainMatcher` Protocol (AZ-344).
|
||||
- Constructor: `__init__(self, runtime: InferenceRuntime, lightglue_runtime: LightGlueRuntime, ransac_filter: RansacFilter, fdr_client: FdrClient, health_window: RollingHealthWindow, config: MatcherConfig)`. The strategy holds the DISK engine ID (returned by `runtime.load_engine`) plus references to the constructor-injected `LightGlueRuntime` + `RansacFilter`.
|
||||
- `match(frame, rerank_result, calibration)`:
|
||||
1. Decode + preprocess the nav-camera frame ONCE (resize / normalise per DISK's input contract).
|
||||
2. Run DISK forward on the query frame → `(query_keypoints, query_descriptors)`.
|
||||
3. `survivors: list[CandidateMatchSet] = []`, `dropped = 0`.
|
||||
4. For each `RerankCandidate` in `rerank_result.candidates`:
|
||||
a. Decode + preprocess the candidate tile (from `tile_pixels_handle`).
|
||||
b. Try DISK forward on the tile → `(tile_keypoints, tile_descriptors)`. On failure: wrap as `MatcherBackboneError`; emit ERROR log + FDR record `kind="matcher.backbone_error"` with `tile_id` + `phase="disk_forward"`; `dropped += 1`; continue.
|
||||
c. Try `lightglue_runtime.match_pair(query_keypoints, query_descriptors, tile_keypoints, tile_descriptors)` → `correspondences` (raw matches before RANSAC). On failure: wrap as `MatcherBackboneError`; phase="lightglue_match"; drop; continue.
|
||||
d. `ransac_result = ransac_filter.filter(correspondences, threshold_px=config.ransac_threshold_px)` → `RansacResult(inlier_correspondences, ransac_outlier_count, per_candidate_residual_px)`. The helper handles RANSAC + median residual computation.
|
||||
e. If `ransac_result.inlier_correspondences.shape[0] == 0`: emit DEBUG log `kind="c3.matcher.zero_inliers"`; `dropped += 1`; continue.
|
||||
f. Append `CandidateMatchSet(tile_id=candidate.tile_id, inlier_count=ransac_result.inlier_correspondences.shape[0], inlier_correspondences=ransac_result.inlier_correspondences, ransac_outlier_count=ransac_result.ransac_outlier_count, per_candidate_residual_px=ransac_result.per_candidate_residual_px)` to `survivors`.
|
||||
5. Determine `survivor_max_inliers = max(s.inlier_count for s in survivors)` (or 0 if empty).
|
||||
6. If `len(survivors) == 0` OR `survivor_max_inliers < config.min_inliers_threshold`: emit ERROR log `kind="c3.matcher.insufficient_inliers"` + FDR record `kind="matcher.insufficient_inliers"`; `health_window.update(now, best_inlier_count=0, had_backbone_error=(dropped > 0))`; raise `InsufficientInliersError`.
|
||||
7. Sort `survivors` descending by `inlier_count`; ties broken by `per_candidate_residual_px` ascending. The first survivor is the best.
|
||||
8. `best = survivors[0]`. If `best.per_candidate_residual_px > config.residual_warn_threshold_px`: emit WARN log `kind="c3.matcher.residual_above_threshold"` (will trigger AdHoP at C3.5).
|
||||
9. `health_window.update(now, best_inlier_count=best.inlier_count, had_backbone_error=(dropped > 0))`.
|
||||
10. Emit FDR record `kind="matcher.frame_done"` with `{frame_id, candidates_input, candidates_dropped, best_inlier_count, best_residual_px, best_tile_id}`.
|
||||
11. Return `MatchResult(frame_id=rerank_result.frame_id, per_candidate=survivors, best_candidate_idx=0, reprojection_residual_px=best.per_candidate_residual_px, matched_at=monotonic_ns(), matcher_label="disk_lightglue", candidates_input=len(rerank_result.candidates), candidates_dropped=dropped)`.
|
||||
- `health_snapshot()`: returns `self._health_window.snapshot()`.
|
||||
- Module-level `create(config, lightglue_runtime, ransac_filter, inference_runtime, health_window) -> CrossDomainMatcher`:
|
||||
1. `disk_weights_path = config.matcher.disk_weights_path` (TRT engine produced by AZ-321).
|
||||
2. Load DISK engine via `inference_runtime.load_engine(disk_weights_path)`.
|
||||
3. Construct `DiskLightGlueMatcher(...)`.
|
||||
- Composition-root wiring path for `config.matcher.strategy == "disk_lightglue"`.
|
||||
- Logging per description.md § 9: INFO ready; WARN residual-above-threshold; ERROR insufficient-inliers + backbone-error; DEBUG per-frame inlier+residual list (gated).
|
||||
- FDR records: `matcher.frame_done` (always per frame), `matcher.backbone_error` (per error), `matcher.insufficient_inliers` (per all-failed event).
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- `DiskLightGlueMatcher` class implementing `CrossDomainMatcher` exactly per the AZ-344 contract.
|
||||
- DISK forward via C7 `InferenceRuntime` (TRT primary; ONNX-RT fallback chain owned by C7 — this task consumes the unified interface).
|
||||
- LightGlue matching via shared helper.
|
||||
- RANSAC + median residual via shared `RansacFilter` helper.
|
||||
- Drop-and-continue per-candidate error handling (Invariant 4).
|
||||
- Below-threshold all-failed → `InsufficientInliersError`.
|
||||
- Deterministic best-candidate selection (Invariant 3).
|
||||
- `RollingHealthWindow.update` after each frame.
|
||||
- Composition-root wiring path.
|
||||
- Logging + FDR record emission per description.md § 9.
|
||||
- Unit tests covering Invariants 1–9, drop-and-continue, below-threshold, deterministic ordering, `tile_pixels_handle` reference semantics, composition-root wiring path.
|
||||
- `BUILD_MATCHER_DISK_LIGHTGLUE` flag wiring (ON in airborne / research / replay-cli; OFF in operator-tooling).
|
||||
|
||||
### Excluded
|
||||
- The Protocol + DTOs + errors + factory + `RollingHealthWindow` — owned by AZ-344.
|
||||
- The `LightGlueRuntime` helper — already AZ-278.
|
||||
- The `RansacFilter` helper — already AZ-282.
|
||||
- The C7 `InferenceRuntime` — owned by AZ-297..AZ-300.
|
||||
- DISK engine compile (.onnx → .trt) — owned by AZ-321; this task consumes the produced engine.
|
||||
- ALIKED+LightGlue (AZ-346) and XFeat (AZ-347).
|
||||
- Component-internal acceptance tests beyond Invariants 1–9 + drop-and-continue smoke: C3-IT-01 (recall floor), C3-IT-03 (cross-domain MRE), C3-IT-04 (tilt outliers), C3-PT-01 (latency NFR), are deferred to Step 9 / E-BBT.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Protocol conformance**
|
||||
`isinstance(DiskLightGlueMatcher(...), CrossDomainMatcher)` returns `True`.
|
||||
|
||||
**AC-2: Best-candidate selection — argmax(inlier_count) + tie-break**
|
||||
Given a `RerankResult` with N=3 candidates whose computed inlier counts are [120, 80, 120] and median residuals [1.4, 1.0, 1.1]
|
||||
When `match(...)` is called
|
||||
Then `best_candidate_idx == 0` (the candidate with `inlier_count=120` AND `residual=1.1` (lower than the other 120-inlier candidate's 1.4)); `per_candidate[0].inlier_count == 120 AND per_candidate_residual_px == 1.1`; `per_candidate[1].inlier_count == 120 AND per_candidate_residual_px == 1.4`; `per_candidate[2].inlier_count == 80`.
|
||||
|
||||
**AC-3: Drop-and-continue on per-candidate `MatcherBackboneError`**
|
||||
Given an `InferenceRuntime` test double that raises `RuntimeError` on the 2nd candidate's DISK forward and succeeds on others
|
||||
When `match(...)` is called
|
||||
Then `len(per_candidate) == 2`; `candidates_dropped == 1`; ONE ERROR log `kind="c3.matcher.backbone_error"` is emitted with `tile_id` + `phase="disk_forward"`; ONE FDR record `kind="matcher.backbone_error"` is emitted; success path continues.
|
||||
|
||||
**AC-4: Drop-and-continue on per-candidate LightGlue failure**
|
||||
Given a `LightGlueRuntime` test double that raises on the 1st candidate's match call
|
||||
When `match(...)` is called
|
||||
Then the candidate is dropped with `phase="lightglue_match"`; ERROR log + FDR record emitted; remaining candidates processed.
|
||||
|
||||
**AC-5: Below-threshold → `InsufficientInliersError`**
|
||||
Given `config.matcher.min_inliers_threshold = 60` AND every candidate's RANSAC inlier count is < 60
|
||||
When `match(...)` is called
|
||||
Then `InsufficientInliersError` is raised; ONE ERROR log `kind="c3.matcher.insufficient_inliers"` + ONE FDR record `kind="matcher.insufficient_inliers"` are emitted; `health_window.update(now, best_inlier_count=0, had_backbone_error=False)` is invoked.
|
||||
|
||||
**AC-6: All-failed → `InsufficientInliersError`**
|
||||
Given every candidate's DISK forward raises
|
||||
When `match(...)` is called
|
||||
Then `InsufficientInliersError` is raised; per-candidate ERROR logs + final ERROR log emitted; `health_window.update(now, best_inlier_count=0, had_backbone_error=True)` is invoked.
|
||||
|
||||
**AC-7: WARN log on residual above threshold**
|
||||
Given the best candidate's `per_candidate_residual_px = 4.2` AND `config.matcher.residual_warn_threshold_px = 2.5`
|
||||
When `match(...)` returns
|
||||
Then ONE WARN log `kind="c3.matcher.residual_above_threshold"` with `{residual_px: 4.2, threshold_px: 2.5}` is emitted.
|
||||
|
||||
**AC-8: `health_window.update` invoked after every `match` (success or failure)**
|
||||
Given any `match(...)` call (success, partial drop, all-failed)
|
||||
When the call completes (returns normally OR raises `InsufficientInliersError`)
|
||||
Then `health_window.update(...)` is invoked exactly ONCE for that frame; `best_inlier_count` matches the actual best inlier count (0 on all-failed); `had_backbone_error == True` if any candidate dropped due to backbone failure.
|
||||
|
||||
**AC-9: `inlier_correspondences` shape contract**
|
||||
Given a successful `match(...)`
|
||||
When inspecting any `CandidateMatchSet`
|
||||
Then `inlier_correspondences.shape == (inlier_count, 4)`; `dtype == float32`.
|
||||
|
||||
**AC-10: Deterministic — same inputs → bit-identical MatchResult**
|
||||
Given fixed inputs and deterministic test doubles
|
||||
When `match(...)` is called 3 times
|
||||
Then all three returns have identical `per_candidate` content (same inlier_counts, same residuals, same best_candidate_idx).
|
||||
|
||||
**AC-11: Composition-root wiring**
|
||||
Given `config.matcher.strategy = "disk_lightglue"` AND a constructed shared `LightGlueRuntime` AND `RansacFilter` AND `InferenceRuntime`
|
||||
When `compose_root(config)` runs
|
||||
Then a `DiskLightGlueMatcher` instance is wired; ONE INFO log `kind="c3.matcher.ready"` with `{strategy: "disk_lightglue", min_inliers_threshold, residual_warn_threshold_px}` is emitted; the strategy's `_lightglue_runtime` is identity-equal to the runtime root's shared helper.
|
||||
|
||||
**AC-12: FDR `matcher.frame_done` per frame**
|
||||
Given a successful `match(...)` returning best candidate with inlier_count=120 and residual=1.1, dropped=1
|
||||
When the call completes
|
||||
Then ONE FDR record `kind="matcher.frame_done"` is emitted with structured fields `{frame_id, candidates_input: 3, candidates_dropped: 1, best_inlier_count: 120, best_residual_px: 1.1, best_tile_id: <tuple>}`.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance** (deferred validation to C3-PT-01):
|
||||
- `match` p95 ≤ 180 ms (3 candidates × ~60 ms DISK forward + LightGlue match + RANSAC).
|
||||
- Per-candidate p95 ≤ 60 ms.
|
||||
- GPU memory ≤ 800 MB combined (DISK engine + LightGlue engine resident).
|
||||
|
||||
**Compatibility**
|
||||
- DISK engine file format owned by C10 + C7; this task consumes via `config.matcher.disk_weights_path`.
|
||||
- Upstream DISK research code drop pinned per Plan-phase; weight changes require C10 rebuild + C3-IT-03 re-run.
|
||||
|
||||
**Reliability**
|
||||
- Drop-and-continue per candidate (Invariant 4).
|
||||
- Single-thread by contract (INV-1).
|
||||
- `InsufficientInliersError` triggers C5 VIO-only fallback (AC-3.5); does NOT crash.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|--------------|------------------|
|
||||
| AC-1 | Protocol conformance | `isinstance` returns `True` |
|
||||
| AC-2 | Best-candidate + tie-break | Lower residual wins among tied inliers |
|
||||
| AC-3 | DISK forward fails on 2nd | 2 survivors; ERROR log + FDR record |
|
||||
| AC-4 | LightGlue fails on 1st | 2 survivors; phase="lightglue_match" |
|
||||
| AC-5 | All below threshold | `InsufficientInliersError`; health update |
|
||||
| AC-6 | All forwards fail | `InsufficientInliersError`; per-candidate logs |
|
||||
| AC-7 | Residual > warn threshold | WARN log emitted |
|
||||
| AC-8 | Health update invoked once per `match` | One update per call regardless of outcome |
|
||||
| AC-9 | Correspondences shape | (I, 4) float32; I == inlier_count |
|
||||
| AC-10 | Determinism | 3 calls return identical content |
|
||||
| AC-11 | `compose_root` wiring | Wired; INFO log; helper identity-shared |
|
||||
| AC-12 | FDR `frame_done` emission | Correct structured fields |
|
||||
|
||||
## Constraints
|
||||
|
||||
- **Drop-and-continue is mandatory** — Invariant 4; per-candidate exceptions never propagate.
|
||||
- **Median residual, not mean** — Invariant 8; computed inside `RansacFilter`.
|
||||
- **Constructor injection only** — no `import gps_denied_onboard.config` inside the strategy module.
|
||||
- **`LightGlueRuntime` and `RansacFilter` are constructor-injected** — never instantiated here.
|
||||
- **DISK engine load at `create` time, NOT at first frame** — engine-output assertion fires at startup.
|
||||
- **Tile pixel decode is per-call** — but the underlying `tile_pixels_handle` is page-cache-backed (not copied into the strategy).
|
||||
- **`RollingHealthWindow.update` is called EXACTLY once per `match`** — including the all-failed path.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: DISK upstream code drop ships an unsupported ONNX op for TRT 10.3**
|
||||
- *Mitigation*: engine compile is C10's responsibility (AZ-321). If C10 cannot build the engine, this task is blocked upstream — surface via tracker dependency mechanism.
|
||||
|
||||
**Risk 2: `LightGlueRuntime.match_pair` API not yet defined**
|
||||
- *Mitigation*: AZ-278 defines the helper API; this task consumes whatever AZ-278 ships. If only single-pair is provided, this task wraps single-pair calls in a per-candidate loop (already structured that way). Surface to AZ-278 implementer at decompose-step-4.
|
||||
|
||||
**Risk 3: Tile pixel decode is non-trivial cost on hot path**
|
||||
- *Mitigation*: tile pixels arrive as page-cache-backed handles from C6; decode (JPEG → ndarray) happens once per candidate. If profiling shows this is a bottleneck, a future optimization pre-decodes adjacent tiles in C6's mmap layer.
|
||||
|
||||
**Risk 4: Deterministic best-candidate tie-break depends on stable sort**
|
||||
- *Mitigation*: Python's `list.sort()` is stable; the implementation uses `sorted(survivors, key=lambda s: (-s.inlier_count, s.per_candidate_residual_px))` which is deterministic. Test AC-2 asserts the exact ordering on a tie scenario.
|
||||
|
||||
**Risk 5: `RollingHealthWindow` drift between matcher implementations**
|
||||
- *Mitigation*: ONE `RollingHealthWindow` class owned by AZ-344; constructor-injected into every concrete matcher. AZ-345/AZ-346/AZ-347 use the same instance type via the same constructor injection.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: `DiskLightGlueMatcher` — production-default `CrossDomainMatcher` for cross-domain feature matching (architecture / E-C3 / `solution.md` / D-C3-1 / AC-1.1 + AC-2.2 + AC-3.1).
|
||||
- **Production code that must exist**: real `DiskLightGlueMatcher` calling real C7 `InferenceRuntime` with real TRT-compiled DISK engine; real shared `LightGlueRuntime` calls; real shared `RansacFilter` for inlier filtering + median residual; real `RollingHealthWindow.update` after each frame; real composition-root wiring.
|
||||
- **Allowed external stubs**: `FakeInferenceRuntime`, `FakeLightGlueRuntime`, `FakeRansacFilter`, `FakeFdrClient`, synthetic frame fixtures for unit tests.
|
||||
- **Unacceptable substitutes**: a Python+NumPy implementation of DISK forward (would not satisfy C3-PT-01 latency); a different RANSAC implementation per matcher (would defeat AZ-282 helper); skipping `RollingHealthWindow.update` on the all-failed path (would lose the health signal C5 needs); calling `LightGlueRuntime` in batch mode without per-candidate inlier breakdown; using the mean residual instead of the median (would violate INV-8).
|
||||
Reference in New Issue
Block a user