mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 10:31:12 +00:00
[AZ-340] [AZ-527] Archive AZ-340 + batch 51 report + cumulative review 49-51
Bookkeeping for batch 51 close: - Archive AZ-340 spec todo/ -> done/ - Add _docs/03_implementation/batch_51_cycle1_report.md - Add _docs/03_implementation/cumulative_review_batches_49-51_cycle1_report.md Verdict: PASS_WITH_WARNINGS. F1 (Medium) escalates the 2-way _assert_engine_output_dim near-duplicate from cumulative-46-48 to a 7-way duplication after AZ-339 + AZ-340; new hygiene PBI AZ-527 formally created. F2 (Low) carries the AC-10 ConfigError vs literal ConfigurationError spec drift (documentation only). - File AZ-527 hygiene PBI (Hygiene -- consolidate _assert_engine_output_dim into a c2-internal helper, 2pt, AZ-255 E-C2). Add the spec stub at _docs/02_tasks/todo/AZ-527_*.md. - Refresh _docs/02_tasks/_dependencies_table.md: +AZ-527 row, totals bumped to 148 tasks / 491 points. - Bump _docs/_autodev_state.md: last_completed_batch=51, last_cumulative_review=batches_49-51. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,218 @@
|
||||
# C2 SelaVPR + EigenPlaces + SALAD Secondary Backbones
|
||||
|
||||
**Task**: AZ-340_c2_selavpr_eigenplaces_salad
|
||||
**Name**: C2 SelaVPR + EigenPlaces + SALAD Secondary Backbones (Research-only)
|
||||
**Description**: Implement `SelaVprStrategy`, `EigenPlacesStrategy`, and `SaladStrategy` — three additional secondary `VprStrategy` backbones used for IT-12 comparative-study (research binary only). All run on the C7 TensorRT runtime (FP16 engines compiled by C10) and are gated OFF for airborne / operator-tooling per ADR-002. Each strategy ships its own concrete `BackbonePreprocessor` per upstream code drop. Embeddings: SelaVPR D=512, EigenPlaces D=2048, SALAD D=8448 (the largest in the C2 family — DINOv2-backed). All three produce L2-normalised embeddings; all three delegate `retrieve_topk` to the C6 TileStore Public API.
|
||||
**Complexity**: 5 points
|
||||
**Dependencies**: AZ-336_c2_vpr_strategy_protocol, AZ-263_initial_structure, AZ-269_config_loader, AZ-298_c7_tensorrt_runtime, AZ-303_c6_storage_interfaces, AZ-283_descriptor_normaliser, AZ-281_engine_filename_schema, AZ-321_c10_engine_compiler, AZ-266_log_module, AZ-272_fdr_record_schema
|
||||
**Component**: c2_vpr (epic AZ-255 / E-C2)
|
||||
**Tracker**: AZ-340
|
||||
**Epic**: AZ-255 (E-C2)
|
||||
|
||||
### Document Dependencies
|
||||
|
||||
- `_docs/02_document/contracts/c2_vpr/vpr_strategy_protocol.md` — Protocol contract; all three strategies satisfy every invariant.
|
||||
- `_docs/02_document/components/02_c2_vpr/description.md` — § 1 secondary backbone designation; § 5 backbone library list (SALAD added per module-layout `BUILD_VPR_SALAD` row).
|
||||
- `_docs/02_document/module-layout.md` — `c2_vpr.sela_vpr`, `c2_vpr.eigen_places`, `c2_vpr.salad` Internal entries; `BUILD_VPR_SELAVPR`, `BUILD_VPR_EIGENPLACES`, `BUILD_VPR_SALAD` rows (all OFF for airborne/operator-tooling, ON for research, replay-cli inherits research selection at config time).
|
||||
- `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md` — `InferenceRuntime` interface (TRT runtime).
|
||||
- `_docs/02_document/contracts/shared_helpers/descriptor_normaliser.md` — L2 normalisation.
|
||||
|
||||
## Problem
|
||||
|
||||
Without this task:
|
||||
|
||||
- The IT-12 comparative-study cannot enumerate SelaVPR, EigenPlaces, or SALAD; researchers cannot compare these three modern backbones (SelaVPR introduced 2024, EigenPlaces a strong baseline since 2023, SALAD a DINOv2-backed 2024 candidate) against UltraVPR / NetVLAD / MegaLoc / MixVPR.
|
||||
- The research binary's link surface is incomplete; the comparative-study CI matrix entry asserting the research binary contains every documented backbone fails.
|
||||
- A future cycle that wants to swap one of these to PRIMARY (e.g., SALAD's DINOv2 backbone may eventually outperform UltraVPR; the research data informs that decision) has no migration path.
|
||||
- SALAD specifically uses DINOv2 — a fundamentally different backbone family (vision transformer rather than CNN) — and adding it to the comparative-study is research-strategy critical.
|
||||
|
||||
## Outcome
|
||||
|
||||
- `src/gps_denied_onboard/components/c2_vpr/sela_vpr.py` defining `SelaVprStrategy` (Protocol-conforming) + `create(config, tile_store, inference_runtime)` factory.
|
||||
- `backbone_label="sela_vpr"`, `descriptor_dim=512`.
|
||||
- Constructor / `embed_query` / `retrieve_topk` / `descriptor_dim` follow the same pattern as MegaLoc / MixVPR.
|
||||
- `src/gps_denied_onboard/components/c2_vpr/_preprocessor_sela_vpr.py` defining `SelaVprBackbonePreprocessor`:
|
||||
- `input_shape() -> (224, 224)` per upstream SelaVPR default.
|
||||
- Normalisation: ImageNet mean/std.
|
||||
- Output dtype FP16, NCHW.
|
||||
- `src/gps_denied_onboard/components/c2_vpr/eigen_places.py` defining `EigenPlacesStrategy`:
|
||||
- `backbone_label="eigen_places"`, `descriptor_dim=2048`.
|
||||
- Same pattern as SelaVPR.
|
||||
- `src/gps_denied_onboard/components/c2_vpr/_preprocessor_eigen_places.py` defining `EigenPlacesBackbonePreprocessor`:
|
||||
- `input_shape() -> (480, 480)` per upstream EigenPlaces default.
|
||||
- Normalisation: ImageNet mean/std.
|
||||
- `src/gps_denied_onboard/components/c2_vpr/salad.py` defining `SaladStrategy`:
|
||||
- `backbone_label="salad"`, `descriptor_dim=8448`.
|
||||
- Same pattern as the others; SALAD's DINOv2 backbone produces patch tokens that the SALAD aggregator turns into a single 8448-d descriptor.
|
||||
- `src/gps_denied_onboard/components/c2_vpr/_preprocessor_salad.py` defining `SaladBackbonePreprocessor`:
|
||||
- `input_shape() -> (322, 322)` per SALAD's published preprocessing (DINOv2-aligned input).
|
||||
- Normalisation: ImageNet mean/std (DINOv2's default).
|
||||
- Composition-root wiring paths for `config.vpr.strategy in {"sela_vpr", "eigen_places", "salad"}`.
|
||||
- `BUILD_VPR_SELAVPR`, `BUILD_VPR_EIGENPLACES`, `BUILD_VPR_SALAD` CMake flags wired per ADR-002.
|
||||
- Logging + FDR records identical pattern to UltraVPR / MegaLoc / MixVPR (per-backbone `backbone_label` distinguishes records).
|
||||
- Engine output shape assertion at load for all three.
|
||||
- Unit tests covering Protocol conformance + invariants + error paths for ALL THREE strategies.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- All three strategy classes (`SelaVprStrategy`, `EigenPlacesStrategy`, `SaladStrategy`) implementing the Protocol.
|
||||
- All three concrete `BackbonePreprocessor` implementations.
|
||||
- Module-level `create` factories for all three.
|
||||
- Composition-root wiring for all three strategy choices.
|
||||
- Engine output shape assertion at load for all three.
|
||||
- Logging + FDR records identical pattern to other backbones.
|
||||
- Unit tests for all three strategies covering invariants + error paths.
|
||||
- `BUILD_VPR_SELAVPR`, `BUILD_VPR_EIGENPLACES`, `BUILD_VPR_SALAD` CMake flag wiring.
|
||||
|
||||
### Excluded
|
||||
|
||||
- The `VprStrategy` Protocol — owned by AZ-336.
|
||||
- Shared `DescriptorNormaliser` — already AZ-283.
|
||||
- C7 TensorRT runtime — owned by AZ-298.
|
||||
- Engine compilation — owned by AZ-321.
|
||||
- Other backbones — AZ-337 (UltraVPR), AZ-338 (NetVLAD), AZ-339 (MegaLoc + MixVPR).
|
||||
- FAISS retrieve wiring — owned by AZ-341.
|
||||
- Recall@10 acceptance tests for these secondary backbones — deferred to Step 9 / E-BBT (research-only, not engine-rule-binding).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1 (per strategy): Protocol conformance**
|
||||
Given a constructed instance of each strategy
|
||||
When `isinstance(strategy, VprStrategy)` is evaluated
|
||||
Then all three return `True`
|
||||
|
||||
**AC-2 (per strategy): `embed_query` produces L2-normalised FP16 embedding of correct dim**
|
||||
Given a valid `NavCameraFrame` and `CameraCalibration`
|
||||
When `embed_query` is called on each strategy
|
||||
Then SelaVPR returns shape (512,); EigenPlaces returns (2048,); SALAD returns (8448,); all `dtype == np.float16`; all have `||embedding||_2 == 1.0 ± 1e-3`
|
||||
|
||||
**AC-3 (per strategy): Deterministic embeddings**
|
||||
Given the same frame
|
||||
When `embed_query` is called 3 times on each strategy
|
||||
Then bit-exact embeddings (ULP-tolerant FP16) for each strategy
|
||||
|
||||
**AC-4 (per strategy): `retrieve_topk` returns exactly k candidates with correct backbone_label**
|
||||
Given a corpus of 100 tiles per strategy's `descriptor_dim` + a constructed `VprQuery`
|
||||
When `retrieve_topk(query, k=10)` is called on each strategy
|
||||
Then `len(candidates) == 10`, sorted ascending; correct `backbone_label` (`"sela_vpr"` / `"eigen_places"` / `"salad"`); correct `descriptor_dim` carried in candidates
|
||||
|
||||
**AC-5 (per strategy): `descriptor_dim()` is stable**
|
||||
Given a constructed strategy
|
||||
When `descriptor_dim()` is called 100 times
|
||||
Then SelaVPR returns 512; EigenPlaces returns 2048; SALAD returns 8448
|
||||
|
||||
**AC-6 (per strategy): Engine output shape mismatch → `ConfigurationError`**
|
||||
Given a TRT engine whose output tensor shape does not match the strategy's expected `descriptor_dim`
|
||||
When `create(...)` is called
|
||||
Then `ConfigurationError` is raised; the strategy is NOT instantiated
|
||||
|
||||
**AC-7 (per strategy): `VprBackboneError` on forward-pass failure**
|
||||
Given an `InferenceRuntime` test double that raises
|
||||
When `embed_query` is called
|
||||
Then `VprBackboneError` is raised; ERROR log + FDR record emitted
|
||||
|
||||
**AC-8 (per strategy): `VprPreprocessError` on corrupt image bytes**
|
||||
Given a frame with malformed `image_bytes`
|
||||
When `embed_query` is called
|
||||
Then `VprPreprocessError` is raised; ERROR log + FDR record emitted
|
||||
|
||||
**AC-9 (per strategy): Composition-root wiring**
|
||||
Given `config.vpr.strategy = "sela_vpr"` (resp. `"eigen_places"`, `"salad"`) AND valid weights AND matching `descriptor_dim`
|
||||
When `compose_root(config)` runs
|
||||
Then the corresponding strategy is wired; AZ-336 factory's pre-flight `descriptor_dim` validation passes; INFO log `kind="c2.vpr.ready"` emitted with correct `{strategy, descriptor_dim}`
|
||||
|
||||
**AC-10 (per strategy): Build-flag exclusion in airborne binary**
|
||||
Given the strategy is selected AND its `BUILD_VPR_*` flag is OFF
|
||||
When the binary tries to load
|
||||
Then `ConfigurationError` is raised with the missing-flag message; fail-fast
|
||||
|
||||
**AC-11 (per strategy): Preprocessing input shape**
|
||||
Given the strategy's preprocessor instance
|
||||
When `input_shape()` is called
|
||||
Then SelaVPR returns `(224, 224)`; EigenPlaces returns `(480, 480)`; SALAD returns `(322, 322)`
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance** (research-only; looser than UltraVPR):
|
||||
- SelaVPR `embed_query` p95 ≤ 60 ms (FP16 TRT; 224×224 input is light).
|
||||
- EigenPlaces `embed_query` p95 ≤ 80 ms (480×480 input + ResNet50-class backbone).
|
||||
- SALAD `embed_query` p95 ≤ 120 ms (DINOv2-Large backbone is the heaviest in the C2 family).
|
||||
- `retrieve_topk` p95: SelaVPR ≤ 2 ms, EigenPlaces ≤ 3 ms, SALAD ≤ 6 ms (8448-d FAISS HNSW is significantly slower; this is the cost of DINOv2's large embedding space).
|
||||
- GPU memory per strategy: SelaVPR ≤ 400 MB, EigenPlaces ≤ 700 MB, SALAD ≤ 1200 MB resident (DINOv2-Large is heavy).
|
||||
- These NFRs are research-side guidance, not engine-rule blockers.
|
||||
|
||||
**Compatibility**
|
||||
- All three consume TRT engines produced by AZ-321 with the AZ-281 self-describing filename schema.
|
||||
- Upstream code drops pinned per Plan-phase; SALAD specifically depends on a pinned DINOv2 weight set.
|
||||
|
||||
**Reliability**
|
||||
- All three single-threaded by contract.
|
||||
- All three use unconditional L2-normalisation (INV-3).
|
||||
- Errors do not crash the process; downstream falls back to VIO-only.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 (each) | `isinstance(<Strategy>(...), VprStrategy)` | `True` for all three |
|
||||
| AC-2 (SelaVPR) | `embed_query` output | shape (512,), float16, L2-norm ≈ 1.0 |
|
||||
| AC-2 (EigenPlaces) | `embed_query` output | shape (2048,), float16, L2-norm ≈ 1.0 |
|
||||
| AC-2 (SALAD) | `embed_query` output | shape (8448,), float16, L2-norm ≈ 1.0 |
|
||||
| AC-3 (each) | `embed_query` × 3 same frame | bit-exact embeddings (ULP-tolerant) |
|
||||
| AC-4 (each) | `retrieve_topk` against fixture corpus | `len == 10`, sorted, correct `backbone_label`, correct `descriptor_dim` |
|
||||
| AC-5 (each) | `descriptor_dim()` × 100 | always returns the correct dim |
|
||||
| AC-6 (each) | TRT engine with wrong output shape | `ConfigurationError` at create time |
|
||||
| AC-7 (each) | `forward` raises | `VprBackboneError`; ERROR log + FDR |
|
||||
| AC-8 (each) | malformed `image_bytes` | `VprPreprocessError`; ERROR log + FDR |
|
||||
| AC-9 (each) | `compose_root(config=<strategy>)` | wired; INFO log with correct backbone label and dim |
|
||||
| AC-10 (each) | airborne binary + strategy chosen | `ConfigurationError` with missing-flag message; fail-fast |
|
||||
| AC-11 (SelaVPR) | `input_shape()` | `(224, 224)` |
|
||||
| AC-11 (EigenPlaces) | `input_shape()` | `(480, 480)` |
|
||||
| AC-11 (SALAD) | `input_shape()` | `(322, 322)` |
|
||||
| Preprocess-shape (each) | `preprocess(frame)` output | NCHW shape `(1, 3, H, W)`, dtype float16 |
|
||||
|
||||
## Constraints
|
||||
|
||||
- **Each strategy ships its own concrete preprocessor** — preprocessing parameters per upstream code drop.
|
||||
- **Preprocessing parameters are weights-coupled** — hard-coded per strategy (SelaVPR 224×224, EigenPlaces 480×480, SALAD 322×322); ImageNet mean/std for all (DINOv2 also uses ImageNet mean/std for its DINOv2 weights).
|
||||
- **Centre-crop logic duplicated, NOT shared** — same trade-off as MegaLoc / MixVPR.
|
||||
- **All three use TensorRT runtime** (consistent with UltraVPR / MegaLoc / MixVPR).
|
||||
- **No engine compilation in this task** — `.trt` engine files come from AZ-321.
|
||||
- **All three hold engine IDs returned by `inference_runtime.load_engine`, NOT engines themselves**.
|
||||
- **No GPU operations in `__init__` beyond engine load**.
|
||||
- **SALAD's high embedding dim (8448) is non-negotiable** — it's the architectural output of the SALAD aggregator over DINOv2 patch tokens. Operators who want a smaller SALAD descriptor must apply PCA-whitening at corpus build time (C10), which produces a different `BUILD_VPR_SALAD_PCA` build flag (out of scope here).
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: SALAD's DINOv2 backbone is significantly heavier than other C2 backbones**
|
||||
- *Risk*: GPU memory + latency budget for SALAD blows the research binary's resource envelope; researchers cannot run multi-strategy comparisons in a single session.
|
||||
- *Mitigation*: SALAD's NFR-perf budget is documented at 120 ms / 1200 MB GPU — significantly looser than UltraVPR. Researchers run SALAD comparisons in single-strategy sessions. If multi-strategy comparison is required, the operator can disable SALAD via build flag for that specific session.
|
||||
|
||||
**Risk 2: SALAD's 8448-d FAISS lookup is slow**
|
||||
- *Risk*: FAISS HNSW with D=8448 may exceed budget on Tier-2 hardware.
|
||||
- *Mitigation*: 6 ms p95 is the documented budget (4× the UltraVPR D=512 lookup); still well under 1 second per frame at 3 Hz. PCA-whitened SALAD (D=512 or D=1024) is the operator-side optimisation if needed; that's a corpus-build-time decision (C10), not a strategy change.
|
||||
|
||||
**Risk 3: SelaVPR / EigenPlaces / SALAD upstream code drops use ONNX ops that TRT 10.3 cannot compile**
|
||||
- *Risk*: Engine compilation succeeds with fallback layers; latency inflates beyond NFR.
|
||||
- *Mitigation*: AZ-321 (engine compile) detects fallback layers. Each strategy is independently affected; one failure does not block others.
|
||||
|
||||
**Risk 4: SALAD's DINOv2 backbone weights have a non-standard licence**
|
||||
- *Risk*: DINOv2 weights' licence (CC-BY-NC) may be incompatible with project distribution.
|
||||
- *Mitigation*: Licence check is operator's responsibility (Plan-phase pinning of upstream); this task implements the strategy assuming licensed weights are available. If licence prevents distribution, the operator does not select SALAD; the strategy class still exists for future use if licence changes.
|
||||
|
||||
**Risk 5: Preprocessing duplication across 7 strategies invites drift**
|
||||
- *Risk*: A bug in centre-crop logic doesn't propagate across the 7 strategies' preprocessors.
|
||||
- *Mitigation*: Same trade-off as MegaLoc / MixVPR — duplication is intentional per description.md § 6. Code review catches cross-strategy bug fixes.
|
||||
|
||||
**Risk 6: Test fixtures for these engines don't exist in CI**
|
||||
- *Risk*: Without TRT engines, full `embed_query` cannot be tested via unit tests.
|
||||
- *Mitigation*: Step 9 / E-BBT validates the real engine path. Unit tests use `FakeInferenceRuntime` for Protocol conformance + invariants; this is sufficient for the Step 6 task scope.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: secondary `VprStrategy` implementations (SelaVPR, EigenPlaces, SALAD) for IT-12 comparative-study (architecture / E-C2 / `solution.md` "SelaVPR, EigenPlaces secondary backbones"; SALAD per `module-layout.md` `BUILD_VPR_SALAD` row).
|
||||
- **Production code that must exist**: real `SelaVprStrategy`, `EigenPlacesStrategy`, `SaladStrategy` classes calling real C7 TRT `InferenceRuntime.forward`; real concrete preprocessors with real OpenCV resize + ImageNet normalisation + FP16 cast; real L2-normalisation; real composition-root wiring paths.
|
||||
- **Allowed external stubs**: tests MAY use `FakeInferenceRuntime` returning pre-computed embeddings; `FakeTileStore`; `FakeFdrClient`; production wiring uses real C7 + real engines + real C6.
|
||||
- **Unacceptable substitutes**: NumPy-only forward passes (would not satisfy NFR budgets, would defeat GPU-bound design); skipping L2-normalisation (would break INV-3); shared preprocessors across strategies (would defeat description.md § 6 isolation); selecting these strategies in airborne binaries (must fail-fast per AC-10); engine load at first frame; per-strategy thread safety; bypassing the Protocol contract for SALAD's high-dim case (e.g., not validating the (1, 8448) engine output shape).
|
||||
Reference in New Issue
Block a user