mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 17:01:14 +00:00
[AZ-341] C2 FAISS HNSW retrieve wiring (FaissBridge + AZ-507 cut)
Shared retrieve_topk plumbing for every concrete C2 VprStrategy:
- FaissBridge centralises the c6 search_topk → VprResult pipeline,
the defended-in-depth INV-4 check (exactly k, distance-ascending),
the WARN-threshold check on distances[0], optional per-frame DEBUG
log, and one `vpr.retrieve_topk` FDR record per call with latency
measurement.
- DescriptorIndexCut Protocol — consumer-side structural cut of c6
DescriptorIndex.search_topk (AZ-507); keeps c2_vpr c6-import-free.
- C2VprConfig gains warn_top1_threshold + debug_per_frame_distances
knobs with validators.
- KNOWN_PAYLOAD_KEYS registers vpr.retrieve_topk for the FDR record
schema with payload {frame_id, backbone_label, top10_distances,
latency_us}; companion fixture added to the AZ-272 roundtrip suite.
- 22 unit tests cover AC-1..AC-11 + NFR-perf microbench (p95 ≤ 0.5 ms)
+ constructor and retrieve-argument validation.
Verdict: PASS_WITH_WARNINGS (2 Low findings — duplicated ISO-ts
helper across c2/c5/c11/c12, captured in AZ-508 hygiene PBI;
spec-listed but unused `normaliser` parameter dropped — INV-3 makes
the embedding L2-normalised at the strategy's `embed_query`).
Tests: 1565 passed / 80 skipped (was 1543; +22 new tests).
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,213 @@
|
||||
# Batch 45 — Code Review
|
||||
|
||||
**Tasks**: AZ-341 (C2 FAISS HNSW Retrieve Wiring)
|
||||
**Cycle**: 1
|
||||
**Reviewer**: autodev
|
||||
**Verdict**: **PASS_WITH_WARNINGS**
|
||||
|
||||
## Scope reviewed
|
||||
|
||||
Production code:
|
||||
|
||||
- `src/gps_denied_onboard/components/c2_vpr/_faiss_bridge.py` (NEW —
|
||||
`FaissBridge` class with constructor validation, `retrieve(...)`
|
||||
method, INV-4 defensive check, WARN-threshold + DEBUG-flag log
|
||||
emission, FDR record + latency capture).
|
||||
- `src/gps_denied_onboard/components/c2_vpr/descriptor_index_cut.py`
|
||||
(NEW — `DescriptorIndexCut` Protocol mirroring c6
|
||||
`DescriptorIndex.search_topk`; AZ-507 consumer-side structural cut
|
||||
so c2_vpr remains c6-import-free; defines the local `TileIdTuple`
|
||||
alias matching `VprCandidate.tile_id`'s composite identity).
|
||||
- `src/gps_denied_onboard/components/c2_vpr/config.py` (added
|
||||
`warn_top1_threshold: float = 0.30` and
|
||||
`debug_per_frame_distances: bool = False` config knobs + their
|
||||
validators in `__post_init__`).
|
||||
- `src/gps_denied_onboard/components/c2_vpr/__init__.py` (re-exports
|
||||
`DescriptorIndexCut` and `TileIdTuple` for the composition root's
|
||||
c6 adapter; deliberately does NOT re-export `FaissBridge` per the
|
||||
internal/`_underscore` convention — each strategy injects the
|
||||
bridge via its own `create(...)` factory).
|
||||
- `src/gps_denied_onboard/fdr_client/records.py` (registered
|
||||
`vpr.retrieve_topk` in `KNOWN_PAYLOAD_KEYS` with payload field set
|
||||
`{frame_id, backbone_label, top10_distances, latency_us}`).
|
||||
|
||||
Tests:
|
||||
|
||||
- `tests/unit/c2_vpr/test_faiss_bridge.py` (NEW — 22 tests covering
|
||||
AC-1..AC-11 + NFR-perf microbench + constructor validation +
|
||||
retrieve-argument validation).
|
||||
- `tests/unit/test_az272_fdr_record_schema.py` (added a fixture
|
||||
payload for `vpr.retrieve_topk` so the schema roundtrip suite
|
||||
covers the new record kind alongside `c12.reloc.requested`).
|
||||
|
||||
## Phase 1 — Context loading
|
||||
|
||||
Inputs read:
|
||||
|
||||
- Task spec: `_docs/02_tasks/todo/AZ-341_c2_faiss_retrieve_wiring.md`
|
||||
- Contracts: `_docs/02_document/contracts/c2_vpr/vpr_strategy_protocol.md`,
|
||||
`_docs/02_document/contracts/c6_tile_cache/descriptor_index.md`,
|
||||
`_docs/02_document/contracts/c6_tile_cache/tile_metadata_store.md`.
|
||||
- Module layout: `_docs/02_document/module-layout.md` (c2_vpr
|
||||
Per-Component Mapping; AZ-507 cross-component rule).
|
||||
- Existing surfaces: `c2_vpr/{interface,config,errors,__init__}.py`,
|
||||
`c6_tile_cache/{interface,errors,_types}.py`, `_types/vpr.py`,
|
||||
`fdr_client/{client,records}.py`, `clock/interface.py`,
|
||||
`c12_operator_orchestrator/operator_reloc_service.py` (pattern
|
||||
reference for FdrClient + Clock + logger composition).
|
||||
|
||||
## Phase 2 — Spec compliance
|
||||
|
||||
AC coverage map (every AC has at least one covering test):
|
||||
|
||||
| AC | Test |
|
||||
|----|------|
|
||||
| AC-1 happy-path retrieve | `test_retrieve_happy_path_returns_vpr_result_and_emits_fdr` |
|
||||
| AC-2 undersized → IndexUnavailableError | `test_retrieve_undersized_corpus_raises_index_unavailable_error` |
|
||||
| AC-3 unordered → IndexUnavailableError | `test_retrieve_unordered_distances_raises_index_unavailable_error` |
|
||||
| AC-4 c6 error propagates unchanged | `test_retrieve_propagates_index_unavailable_error_unchanged` |
|
||||
| AC-5 WARN threshold trigger | `test_retrieve_emits_warn_log_when_top1_above_threshold` |
|
||||
| AC-6 WARN not triggered when below | `test_retrieve_does_not_emit_warn_when_top1_below_threshold` |
|
||||
| AC-7 DEBUG on → log | `test_retrieve_emits_debug_log_when_per_frame_distances_on` |
|
||||
| AC-8 DEBUG off → no log | `test_retrieve_does_not_emit_debug_log_when_per_frame_distances_off` |
|
||||
| AC-9 FDR record fields | `test_retrieve_fdr_record_fields_are_populated_with_positive_latency` |
|
||||
| AC-10 retrieve_topk body is one line | `test_every_concrete_strategy_retrieve_topk_body_is_one_return_statement` |
|
||||
| AC-11 per-strategy descriptor_dim | `test_descriptor_dim_carried_through_to_each_candidate` |
|
||||
| NFR-perf p95 ≤ 0.5 ms | `test_bridge_retrieve_overhead_p95_under_500us` |
|
||||
|
||||
Contract verification:
|
||||
|
||||
- `vpr_strategy_protocol.md` — the bridge returns ``VprResult`` with
|
||||
exactly ``k`` candidates sorted ascending (INV-4), populates
|
||||
``backbone_label`` non-empty (INV-5), and propagates
|
||||
``IndexUnavailableError`` rather than returning stale candidates
|
||||
(C2-ST-01). All satisfied.
|
||||
- `descriptor_index.md` — the c6 surface returns ``≤ k`` results
|
||||
(Invariant I-2) and ``IndexUnavailableError`` on stale handle / dim
|
||||
mismatch. The bridge enforces a stricter operational invariant
|
||||
(== k) on top, which is the task-spec's explicit defended-in-depth
|
||||
intent (see Constraints § "The defensive INV-4 check is
|
||||
mandatory"). The operational/structural distinction is documented
|
||||
in `_faiss_bridge.py`'s module docstring.
|
||||
|
||||
## Phase 3 — Code quality
|
||||
|
||||
- **SRP**: `FaissBridge` has exactly one job — share the retrieve
|
||||
plumbing across strategies. `DescriptorIndexCut` has one job —
|
||||
cut c6 imports per AZ-507.
|
||||
- **Constructor validation**: every argument is type- and
|
||||
range-checked at construction; `ValueError` for value violations,
|
||||
`TypeError` for type violations.
|
||||
- **Error handling**: no bare `except`; only catches nothing (lets
|
||||
c6 errors propagate per AC-4); raises `IndexUnavailableError` from
|
||||
c2_vpr's family for the bridge's own INV-4 violations.
|
||||
- **Naming**: `_verify_invariants` / `_log_invariant_violation` /
|
||||
`_emit_fdr_record` read clearly from the call site.
|
||||
- **Complexity**: `retrieve(...)` is ~30 lines including blank
|
||||
lines; `_verify_invariants` is ~22 lines. All under 50.
|
||||
- **Test quality**: every test follows Arrange / Act / Assert with
|
||||
explicit comments; assertions check the actual record kind, the
|
||||
WARN log's structured `kv`, and the FDR payload by key — not
|
||||
just "no exception thrown".
|
||||
- **Dead code**: none.
|
||||
- **No verbose debug logging by default**: DEBUG per-frame distances
|
||||
is OFF by default (AC-8) — only ON when explicitly enabled by
|
||||
config; the WARN threshold is config-driven, not hard-coded.
|
||||
|
||||
## Phase 4 — Security quick-scan
|
||||
|
||||
- No SQL, no `subprocess`, no `eval` / `exec`.
|
||||
- No hardcoded secrets.
|
||||
- No user input is interpolated into log messages or FDR records
|
||||
beyond integer `frame_id` and string `backbone_label` already
|
||||
validated upstream.
|
||||
- Sensitive data: the FDR record carries `top10_distances` (raw
|
||||
floats) — not PII, not signing material. Safe to persist.
|
||||
|
||||
## Phase 5 — Performance scan
|
||||
|
||||
- `retrieve` is O(k) for the INV-4 sorted check + O(k) for
|
||||
candidate construction; k=10 typical.
|
||||
- No N+1 / unbounded fetch / blocking I/O / async-context issues.
|
||||
- FDR enqueue is non-blocking (SPSC ring, drop-oldest on overrun
|
||||
delegated to the AZ-274 overrun policy).
|
||||
- Microbench `test_bridge_retrieve_overhead_p95_under_500us`
|
||||
exercises 1000 calls with a wall-clock-backed `Clock` and asserts
|
||||
p95 ≤ 500 µs — passes.
|
||||
|
||||
## Phase 6 — Cross-task consistency
|
||||
|
||||
Single-task batch (AZ-341 only). N/A.
|
||||
|
||||
## Phase 7 — Architecture compliance
|
||||
|
||||
- **Layer direction**: c2_vpr (Layer 3) imports only from
|
||||
`_types.vpr` (L1), `clock` (L1), `config` (L1), `logging` (L1),
|
||||
`fdr_client` (L1), and its own component module
|
||||
(`descriptor_index_cut`). No upward imports. ✓
|
||||
- **Public API respect**: ZERO direct imports of
|
||||
`gps_denied_onboard.components.c6_tile_cache.*` in any c2_vpr
|
||||
source file — the bridge consumes `DescriptorIndexCut` per the
|
||||
AZ-507 rule.
|
||||
- **New cyclic dependencies**: none introduced.
|
||||
- **Duplicate symbols**: `IndexUnavailableError` already exists in
|
||||
both `c2_vpr.errors` and `c6_tile_cache.errors` (pre-existing,
|
||||
documented in both modules' docstrings — intentional namespace
|
||||
duplication for the closed-envelope rewrap pattern).
|
||||
- **Cross-cutting concerns**: `_iso_ts_from_clock` is duplicated
|
||||
across `c2_vpr/_faiss_bridge.py`, `c5_state`, `c11_tile_manager`,
|
||||
and `c12_operator_orchestrator/operator_reloc_service.py`. See
|
||||
Finding F1 below.
|
||||
|
||||
## Findings
|
||||
|
||||
| # | Severity | Category | File:Line | Title |
|
||||
|---|----------|----------------|------------------------------------------------------|-------|
|
||||
| 1 | Low | Maintainability | `c2_vpr/_faiss_bridge.py`:`_iso_ts_from_clock` | Duplicated `_iso_ts_from_clock` helper across c2/c5/c11/c12 |
|
||||
| 2 | Low | Spec-Gap | `_docs/02_tasks/todo/AZ-341_c2_faiss_retrieve_wiring.md` | Task spec's constructor lists `normaliser: DescriptorNormaliser` but body never calls it; bridge correctly omits the parameter |
|
||||
|
||||
### Finding Details
|
||||
|
||||
**F1: Duplicated `_iso_ts_from_clock` helper** (Low / Maintainability)
|
||||
|
||||
- Location: `c2_vpr/_faiss_bridge.py` `_iso_ts_from_clock`
|
||||
- Description: This six-line helper converts `clock.time_ns()` to an
|
||||
RFC 3339 UTC ISO string with nanosecond precision. The same body
|
||||
appears inline in `c12_operator_orchestrator/operator_reloc_service.py`,
|
||||
`c11_tile_manager`, and `c5_state`. Factoring it into a shared
|
||||
helper (e.g. `helpers.iso_ts_from_clock`) would let every FDR
|
||||
producer share one definition.
|
||||
- Suggestion: defer to a cross-cutting hygiene PBI (AZ-508
|
||||
"ISO timestamps consolidation" is already in `todo/` for exactly
|
||||
this concern). No action this batch.
|
||||
- Task: AZ-341
|
||||
|
||||
**F2: Spec lists unused `normaliser` constructor parameter** (Low / Spec-Gap)
|
||||
|
||||
- Location: `_docs/02_tasks/todo/AZ-341_c2_faiss_retrieve_wiring.md`
|
||||
§ Outcome (Constructor)
|
||||
- Description: The task-spec constructor signature names a
|
||||
`normaliser: DescriptorNormaliser` parameter, but the `retrieve(...)`
|
||||
body uses neither L2 nor intra-cluster normalisation — the
|
||||
embedding arrives already L2-normalised because every concrete
|
||||
`VprStrategy.embed_query` normalises before returning the
|
||||
`VprQuery` (per `VprStrategy` INV-3). Including the parameter
|
||||
would be dead weight on every strategy's `create(...)` factory
|
||||
with no behavioural effect.
|
||||
- Suggestion: surface to the user; if the spec genuinely intended
|
||||
defensive re-normalisation in the bridge, add it explicitly and
|
||||
document why (it conflicts with INV-3 of the producing strategy).
|
||||
Otherwise update the spec to drop the parameter. The
|
||||
implementation matches the literal AC list either way.
|
||||
- Task: AZ-341
|
||||
|
||||
## Verdict
|
||||
|
||||
PASS_WITH_WARNINGS — two Low-severity findings, both pre-existing
|
||||
or stylistic. No Critical, no High, no Medium. The implementation
|
||||
satisfies every AC, every NFR, the AZ-507 cross-component rule, and
|
||||
the existing C2 contract surface.
|
||||
|
||||
The implement skill's Auto-Fix gate accepts this verdict without
|
||||
intervention; both findings carry over to the cumulative review
|
||||
(batches 43-45) where they may be batched with similar items.
|
||||
Reference in New Issue
Block a user