Files
Oleksandr Bezdieniezhnykh 33486588de [AZ-271] [AZ-276] [AZ-278] [AZ-282] Finish cross-cutting helpers + relax opencv pin
E-CC-HELPERS closes with the three remaining Layer-1 helpers and
E-CC-CONF closes with the env > YAML > defaults precedence test
gate. All four tickets ship with frozen public surfaces, hermetic
unit tests, and no upward (components.*) imports.

* AZ-271 — tests/unit/shared/config/test_precedence.py (5 ACs + smoke
  test + helper that names the layer in failure messages).
* AZ-282 — helpers/ransac_filter.py: static RansacFilter +
  RansacResult; cv2.setRNGSeed(0) for byte-equal determinism;
  median residual semantics pinned by contract.
* AZ-276 — helpers/imu_preintegrator.py + make_imu_preintegrator;
  GTSAM PreintegratedCombinedMeasurements; strict-monotonic ts_ns
  guard runs before any state mutation. Adjacent hygiene:
  _types/nav.py ImuSample/ImuWindow now use ts_ns:int and the
  spec-mandated ImuBias dataclass.
* AZ-278 — helpers/lightglue_runtime.py: structural R14 fix.
  LightGlueRuntime + non-blocking concurrent-access guard that
  raises rather than serialising. EngineHandle Protocol in
  _types/manifests.py + KeypointSet/CorrespondenceSet in
  _types/matching.py (Protocol surface adds approved by spec).

Dependency conflict (Finding 1, user-approved): gtsam 4.2 (PyPI) is
numpy-1.x-ABI only; opencv-python>=4.12 needs numpy>=2 at runtime.
Resolution: opencv-python pin relaxed to >=4.11.0.86,<4.12. The
D-CROSS-CVE-1 ratchet at ci/opencv_pin_gate.py is held at 4.11.0
with the original 4.12.0 floor restored once a numpy-2-compatible
gtsam wheel ships. Full replay procedure in
_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md.

Tests: 294 passed, 2 skipped (cmake/actionlint env-skips,
pre-existing). 43 new tests added for batch 5. Ruff check + format
clean.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 03:23:33 +03:00

10 KiB

LightGlueRuntime Helper Module (R14 fix)

Task: AZ-278_lightglue_runtime Name: LightGlueRuntime Helper Description: Implement the shared LightGlueRuntime helper that owns the LightGlue inference engine handle for both C2.5 (single-pair inlier counting) and C3 (heavier matching pass). This is the structural fix for R14 (the original C2.5 ↔ C3 import cycle): the runtime sits at Layer 1 with no components.* imports, so the cycle becomes impossible to express. Single CUDA stream; concurrent access forbidden by contract; composition root binds to the single F3 hot-path thread. Complexity: 3 points Dependencies: AZ-263_initial_structure Component: shared.helpers.lightglue_runtime (cross-cutting; epic AZ-264 / E-CC-HELPERS) Tracker: AZ-278 Epic: AZ-264 (E-CC-HELPERS)

Document Dependencies

  • _docs/02_document/contracts/shared_helpers/lightglue_runtime.md — frozen public interface this task produces.
  • _docs/02_document/common-helpers/03_helper_lightglue_runtime.md — design rationale and R14 context.

Problem

C2.5 (Re-rank) and C3 (CrossDomainMatcher) both call LightGlue. In cycle 1 of _docs/02_document/epics.md, LightGlue ownership was ambiguous and produced R14: a circular import / runtime dependency between C2.5 and C3 (the "K=10 → N=3 funnel" both wanted to own the engine). Without a shared runtime:

  • The engine is built / loaded twice, doubling GPU memory at takeoff (Tier-2 has only 8 GB).
  • C2.5 and C3 drift on engine version pinning, producing inconsistent matches.
  • Their import cycle is a recurring footgun: any future refactor will tempt one to import from the other.

Outcome

  • A single LightGlueRuntime instance is constructed once at takeoff by the composition root from C7's deserialize_engine(LIGHTGLUE_ENGINE_CACHE_ENTRY) and is constructor-injected into BOTH C2.5 and C3.
  • The C2.5 ↔ C3 import cycle is structurally impossible: the runtime lives at Layer 1 (helpers/) and imports zero components.* modules. Both consumers depend on the helper; neither depends on the other.
  • Concurrent access is rejected at runtime by an explicit guard (LightGlueConcurrentAccessError), preserving the single-CUDA-stream invariant. The composition root binds the runtime to the single F3 hot-path thread; AC-4 of the contract is the canary that catches future composition-root mistakes.
  • The helper exposes no set_* / update_* methods — once constructed, the runtime's behaviour is fixed.

Scope

Included

  • LightGlueRuntime(engine_handle: EngineHandle) constructor.
  • match(features_a: KeypointSet, features_b: KeypointSet) -> CorrespondenceSet — single-pair path used by C2.5.
  • match_batch(features_a_list, features_b_list) -> list[CorrespondenceSet] — batch path used by C3.
  • descriptor_dim() -> int accessor for shape validation upstream of match.
  • Concurrent-access guard that raises LightGlueConcurrentAccessError on overlapping match / match_batch entries.
  • LightGlueRuntimeError (construction / dim mismatch) and LightGlueConcurrentAccessError (concurrent entry) exception types.
  • Public interface contract published at _docs/02_document/contracts/shared_helpers/lightglue_runtime.md.

Excluded

  • Engine compilation / serialisation — C7.
  • Engine filename schema — helpers.engine_filename_schema (separate task in this epic).
  • Engine cache management / takeoff load — C10.
  • Backbone-specific feature extraction (DISK / ALIKED / XFeat) — C3 / C7.
  • Multi-GPU / multi-stream / mixed-backbone — out of scope for v1.0.0.
  • The EngineHandle Protocol itself — owned by _types/manifests.py (AZ-263) so Layer 1 can reference it without depending on C7.

Acceptance Criteria

AC-1: Single-pair match (C2.5 path) Given a pair of KeypointSets with matching descriptor dim and a synthetic-overlap fixture When match(features_a, features_b) runs Then a CorrespondenceSet is returned with len > 0 and the inlier-count helper used by C2.5 finds the expected count

AC-2: Batch match (C3 path) Given three pairs of KeypointSets When match_batch([a1, a2, a3], [b1, b2, b3]) runs Then three CorrespondenceSets are returned in input order; per-pair invariants match the single-pair path

AC-3: Descriptor-dim mismatch rejected Given features whose descriptor_dim does not match the engine's expected dim When match runs Then LightGlueRuntimeError is raised with a message naming both the expected and actual dims

AC-4: Concurrent access rejected Given two threads call match simultaneously on the same LightGlueRuntime instance When the second call enters Then LightGlueConcurrentAccessError is raised in the second thread; the first thread completes normally

AC-5: Construction-time guard Given LightGlueRuntime(engine_handle=None) When construction runs Then LightGlueRuntimeError is raised mentioning engine_handle

AC-6: No upward imports — R14 structural fix Given the helper module When a static-import check runs across gps_denied_onboard.helpers.lightglue_runtime Then it imports ONLY from _types, numpy, and stdlib — NO imports from gps_denied_onboard.components.* (verified by importlinter or grep gate in CI)

AC-7: Determinism downstream of the engine Given the same (features_a, features_b) pair matched twice with the same engine_handle When match runs both times Then both CorrespondenceSet outputs are byte-equal (engine determinism is a C7 concern; this AC asserts the helper itself adds no non-determinism)

Non-Functional Requirements

Performance

  • match p99 ≤ 30 ms on Tier-2 with the production DISK+LightGlue engine on a typical K=10 candidate pair (matches the per-frame budget for C2.5's K=10 → N=3 funnel).
  • Helper-level overhead (excluding the engine call itself) ≤ 100 µs — verified via a benchmark that swaps in a stub engine handle.

Reliability

  • LightGlueRuntimeError and LightGlueConcurrentAccessError are the ONLY exception types the public surface raises. Engine-internal exceptions MUST be wrapped.
  • Pure-deterministic given a deterministic engine; the helper itself adds no random state.

Concurrency

  • Single-thread by contract. The concurrent-access guard is the runtime invariant detector — any composition-root regression that wires the runtime into multiple threads is caught immediately rather than producing GPU memory corruption.

Unit Tests

AC Ref What to Test Required Outcome
AC-1 single-pair match on synthetic-overlap fixture non-empty CorrespondenceSet
AC-2 batch of 3 pairs three results in input order; per-pair invariants match AC-1
AC-3 dim-mismatched features LightGlueRuntimeError; message names expected & actual dims
AC-4 two threads call match simultaneously one succeeds; the second raises LightGlueConcurrentAccessError
AC-5 construct with engine_handle=None LightGlueRuntimeError
AC-6 importlinter / grep gate over helpers/lightglue_runtime.py no components.* imports
AC-7 same pair matched twice byte-equal outputs (with deterministic stub engine)
NFR-perf microbench match overhead with stub engine (10k iterations on Tier-2 fixture) helper overhead ≤ 100 µs

Constraints

  • Public surface frozen by _docs/02_document/contracts/shared_helpers/lightglue_runtime.md v1.0.0.
  • Layer 1 Foundation only. NO upward imports — this is the load-bearing constraint for the R14 fix.
  • The EngineHandle Protocol must be defined in _types/manifests.py (AZ-263 / E-BOOT) so this helper can reference it without importing C7. If _types/manifests.py does not yet define the Protocol surface (forward(...), descriptor_dim), this task adds it — that is the only _types edit allowed by this task.
  • No new dependency beyond what AZ-263 / E-BOOT pinned.

Risks & Mitigation

Risk 1: Composition root accidentally creates two runtimes (one for C2.5, one for C3)

  • Risk: Future composition-root refactor instantiates LightGlueRuntime twice; engine memory doubles, behaviour drifts.
  • Mitigation: The composition-root contract test (E-CC-CONF / AZ-246, AZ-269/AZ-270 in scope) already verifies cardinality of cross-cutting helpers. This task's contract documents that EXACTLY ONE instance is expected; the composition-root validator is the enforcement point.

Risk 2: Concurrent-access guard introduces hot-path overhead

  • Risk: A naive threading.Lock on every match call adds 100s of µs.
  • Mitigation: The guard uses a non-blocking threading.local()-style check or a Lock(blocking=False).acquire() pattern that simply RAISES on contention rather than serialising callers — the contract is "concurrent calls are a bug", not "serialise concurrent callers". NFR-perf microbench validates the overhead budget.

Risk 3: A future backbone needs a different match shape

  • Risk: A new feature backbone produces 5-tuple correspondences instead of the current 4-tuple (e.g., adds confidence per match).
  • Mitigation: The contract version bump path is documented (Versioning Rules section). Adding a field is non-breaking IF consumers tolerate the extra field; otherwise it is a major-version contract change with a deprecation pass.

Runtime Completeness

  • Named capability: shared LightGlue inference runtime with single-CUDA-stream guarantee + R14 structural cycle fix (architecture / E-CC-HELPERS / 03_helper_lightglue_runtime.md).
  • Production code that must exist: real EngineHandle-backed match dispatch; real concurrent-access guard; real descriptor-dim validation.
  • Allowed external stubs: a deterministic stub EngineHandle is allowed in tests (and recommended for AC-7 determinism) but production paths use C7's real engine.
  • Unacceptable substitutes: bypassing the concurrent-access guard with threading.Lock (silently serialising callers); allowing each consumer to construct its own runtime; reintroducing a C2.5 → C3 (or C3 → C2.5) import to "share state". Any of those reintroduces R14.

Contract

This task produces the contract at _docs/02_document/contracts/shared_helpers/lightglue_runtime.md. Consumers MUST read that file — not this task spec — to discover the interface.