Decompose Step 6 snapshot: 140 task specs + contract docs

Closes out greenfield Step 6 (Decompose) for all 14 components
(C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446
plus the _dependencies_table.md and component contract documents.

State file updated to greenfield Step 7 (Implement), not_started.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-11 00:39:48 +03:00
parent 8171fcb29e
commit 880eabcb3f
172 changed files with 22897 additions and 35 deletions
@@ -0,0 +1,146 @@
# LightGlueRuntime Helper Module (R14 fix)
**Task**: AZ-278_lightglue_runtime
**Name**: LightGlueRuntime Helper
**Description**: Implement the shared `LightGlueRuntime` helper that owns the LightGlue inference engine handle for both C2.5 (single-pair inlier counting) and C3 (heavier matching pass). This is the structural fix for R14 (the original C2.5 ↔ C3 import cycle): the runtime sits at Layer 1 with no `components.*` imports, so the cycle becomes impossible to express. Single CUDA stream; concurrent access forbidden by contract; composition root binds to the single F3 hot-path thread.
**Complexity**: 3 points
**Dependencies**: AZ-263_initial_structure
**Component**: shared.helpers.lightglue_runtime (cross-cutting; epic AZ-264 / E-CC-HELPERS)
**Tracker**: AZ-278
**Epic**: AZ-264 (E-CC-HELPERS)
### Document Dependencies
- `_docs/02_document/contracts/shared_helpers/lightglue_runtime.md` — frozen public interface this task produces.
- `_docs/02_document/common-helpers/03_helper_lightglue_runtime.md` — design rationale and R14 context.
## Problem
C2.5 (Re-rank) and C3 (CrossDomainMatcher) both call LightGlue. In cycle 1 of `_docs/02_document/epics.md`, LightGlue ownership was ambiguous and produced R14: a circular import / runtime dependency between C2.5 and C3 (the "K=10 → N=3 funnel" both wanted to own the engine). Without a shared runtime:
- The engine is built / loaded twice, doubling GPU memory at takeoff (Tier-2 has only 8 GB).
- C2.5 and C3 drift on engine version pinning, producing inconsistent matches.
- Their import cycle is a recurring footgun: any future refactor will tempt one to import from the other.
## Outcome
- A single `LightGlueRuntime` instance is constructed once at takeoff by the composition root from C7's `deserialize_engine(LIGHTGLUE_ENGINE_CACHE_ENTRY)` and is constructor-injected into BOTH C2.5 and C3.
- The C2.5 ↔ C3 import cycle is structurally impossible: the runtime lives at Layer 1 (`helpers/`) and imports zero `components.*` modules. Both consumers depend on the helper; neither depends on the other.
- Concurrent access is rejected at runtime by an explicit guard (`LightGlueConcurrentAccessError`), preserving the single-CUDA-stream invariant. The composition root binds the runtime to the single F3 hot-path thread; AC-4 of the contract is the canary that catches future composition-root mistakes.
- The helper exposes no `set_*` / `update_*` methods — once constructed, the runtime's behaviour is fixed.
## Scope
### Included
- `LightGlueRuntime(engine_handle: EngineHandle)` constructor.
- `match(features_a: KeypointSet, features_b: KeypointSet) -> CorrespondenceSet` — single-pair path used by C2.5.
- `match_batch(features_a_list, features_b_list) -> list[CorrespondenceSet]` — batch path used by C3.
- `descriptor_dim() -> int` accessor for shape validation upstream of `match`.
- Concurrent-access guard that raises `LightGlueConcurrentAccessError` on overlapping `match` / `match_batch` entries.
- `LightGlueRuntimeError` (construction / dim mismatch) and `LightGlueConcurrentAccessError` (concurrent entry) exception types.
- Public interface contract published at `_docs/02_document/contracts/shared_helpers/lightglue_runtime.md`.
### Excluded
- Engine compilation / serialisation — C7.
- Engine filename schema — `helpers.engine_filename_schema` (separate task in this epic).
- Engine cache management / takeoff load — C10.
- Backbone-specific feature extraction (DISK / ALIKED / XFeat) — C3 / C7.
- Multi-GPU / multi-stream / mixed-backbone — out of scope for v1.0.0.
- The `EngineHandle` Protocol itself — owned by `_types/manifests.py` (AZ-263) so Layer 1 can reference it without depending on C7.
## Acceptance Criteria
**AC-1: Single-pair match (C2.5 path)**
Given a pair of `KeypointSet`s with matching descriptor dim and a synthetic-overlap fixture
When `match(features_a, features_b)` runs
Then a `CorrespondenceSet` is returned with `len > 0` and the inlier-count helper used by C2.5 finds the expected count
**AC-2: Batch match (C3 path)**
Given three pairs of `KeypointSet`s
When `match_batch([a1, a2, a3], [b1, b2, b3])` runs
Then three `CorrespondenceSet`s are returned in input order; per-pair invariants match the single-pair path
**AC-3: Descriptor-dim mismatch rejected**
Given features whose `descriptor_dim` does not match the engine's expected dim
When `match` runs
Then `LightGlueRuntimeError` is raised with a message naming both the expected and actual dims
**AC-4: Concurrent access rejected**
Given two threads call `match` simultaneously on the same `LightGlueRuntime` instance
When the second call enters
Then `LightGlueConcurrentAccessError` is raised in the second thread; the first thread completes normally
**AC-5: Construction-time guard**
Given `LightGlueRuntime(engine_handle=None)`
When construction runs
Then `LightGlueRuntimeError` is raised mentioning `engine_handle`
**AC-6: No upward imports — R14 structural fix**
Given the helper module
When a static-import check runs across `gps_denied_onboard.helpers.lightglue_runtime`
Then it imports ONLY from `_types`, numpy, and stdlib — NO imports from `gps_denied_onboard.components.*` (verified by importlinter or grep gate in CI)
**AC-7: Determinism downstream of the engine**
Given the same `(features_a, features_b)` pair matched twice with the same `engine_handle`
When `match` runs both times
Then both `CorrespondenceSet` outputs are byte-equal (engine determinism is a C7 concern; this AC asserts the helper itself adds no non-determinism)
## Non-Functional Requirements
**Performance**
- `match` p99 ≤ 30 ms on Tier-2 with the production DISK+LightGlue engine on a typical K=10 candidate pair (matches the per-frame budget for C2.5's K=10 → N=3 funnel).
- Helper-level overhead (excluding the engine call itself) ≤ 100 µs — verified via a benchmark that swaps in a stub engine handle.
**Reliability**
- `LightGlueRuntimeError` and `LightGlueConcurrentAccessError` are the ONLY exception types the public surface raises. Engine-internal exceptions MUST be wrapped.
- Pure-deterministic given a deterministic engine; the helper itself adds no random state.
**Concurrency**
- Single-thread by contract. The concurrent-access guard is the runtime invariant detector — any composition-root regression that wires the runtime into multiple threads is caught immediately rather than producing GPU memory corruption.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | single-pair match on synthetic-overlap fixture | non-empty `CorrespondenceSet` |
| AC-2 | batch of 3 pairs | three results in input order; per-pair invariants match AC-1 |
| AC-3 | dim-mismatched features | `LightGlueRuntimeError`; message names expected & actual dims |
| AC-4 | two threads call `match` simultaneously | one succeeds; the second raises `LightGlueConcurrentAccessError` |
| AC-5 | construct with `engine_handle=None` | `LightGlueRuntimeError` |
| AC-6 | importlinter / grep gate over `helpers/lightglue_runtime.py` | no `components.*` imports |
| AC-7 | same pair matched twice | byte-equal outputs (with deterministic stub engine) |
| NFR-perf | microbench `match` overhead with stub engine (10k iterations on Tier-2 fixture) | helper overhead ≤ 100 µs |
## Constraints
- Public surface frozen by `_docs/02_document/contracts/shared_helpers/lightglue_runtime.md` v1.0.0.
- Layer 1 Foundation only. NO upward imports — this is the load-bearing constraint for the R14 fix.
- The `EngineHandle` Protocol must be defined in `_types/manifests.py` (AZ-263 / E-BOOT) so this helper can reference it without importing C7. If `_types/manifests.py` does not yet define the Protocol surface (`forward(...)`, `descriptor_dim`), this task adds it — that is the only `_types` edit allowed by this task.
- No new dependency beyond what AZ-263 / E-BOOT pinned.
## Risks & Mitigation
**Risk 1: Composition root accidentally creates two runtimes (one for C2.5, one for C3)**
- *Risk*: Future composition-root refactor instantiates `LightGlueRuntime` twice; engine memory doubles, behaviour drifts.
- *Mitigation*: The composition-root contract test (E-CC-CONF / AZ-246, AZ-269/AZ-270 in scope) already verifies cardinality of cross-cutting helpers. This task's contract documents that EXACTLY ONE instance is expected; the composition-root validator is the enforcement point.
**Risk 2: Concurrent-access guard introduces hot-path overhead**
- *Risk*: A naive `threading.Lock` on every `match` call adds 100s of µs.
- *Mitigation*: The guard uses a non-blocking `threading.local()`-style check or a `Lock(blocking=False).acquire()` pattern that simply RAISES on contention rather than serialising callers — the contract is "concurrent calls are a bug", not "serialise concurrent callers". NFR-perf microbench validates the overhead budget.
**Risk 3: A future backbone needs a different match shape**
- *Risk*: A new feature backbone produces 5-tuple correspondences instead of the current 4-tuple (e.g., adds confidence per match).
- *Mitigation*: The contract version bump path is documented (`Versioning Rules` section). Adding a field is non-breaking IF consumers tolerate the extra field; otherwise it is a major-version contract change with a deprecation pass.
## Runtime Completeness
- **Named capability**: shared LightGlue inference runtime with single-CUDA-stream guarantee + R14 structural cycle fix (architecture / E-CC-HELPERS / `03_helper_lightglue_runtime.md`).
- **Production code that must exist**: real `EngineHandle`-backed match dispatch; real concurrent-access guard; real descriptor-dim validation.
- **Allowed external stubs**: a deterministic stub `EngineHandle` is allowed in tests (and recommended for AC-7 determinism) but production paths use C7's real engine.
- **Unacceptable substitutes**: bypassing the concurrent-access guard with `threading.Lock` (silently serialising callers); allowing each consumer to construct its own runtime; reintroducing a C2.5 → C3 (or C3 → C2.5) import to "share state". Any of those reintroduces R14.
## Contract
This task produces the contract at `_docs/02_document/contracts/shared_helpers/lightglue_runtime.md`.
Consumers MUST read that file — not this task spec — to discover the interface.