# Test Specification — C2 Visual Place Recognition Component-scoped. Suite-level coverage is in `_docs/02_document/tests/*.md`; canonical traceability is `tests/traceability-matrix.md`. ## Acceptance Criteria Traceability | AC ID | Acceptance Criterion (one-line) | Test IDs | Coverage | |-------|---------------------------------|----------|----------| | AC-2.1b | Satellite-anchor registration meets AC-1.1/1.2/2.2/8.2/8.6 | FT-P-05, FT-P-19, **C2-IT-01** | Covered | | AC-2.2 (cross-domain portion) | MRE <2.5 px cross-domain | FT-P-06, **C2-IT-02** (recall floor only — MRE owned by C3) | Covered | | AC-4.1 | E2E latency <400 ms p95 | NFT-PERF-01, **C2-PT-01** | Covered | | AC-NEW-7 | Cache poisoning safety budget | NFT-SEC-01 (onboard-side), **C2-IT-03** | Covered (relaxed) | | AC-8.6 (scale-ratio portion) | Scale-ratio satellite relocalization | FT-P-19, **C2-IT-04** | Covered | --- ## Component-Internal Tests ### C2-IT-01: top-K=10 recall at p=10 on Derkachi **Summary**: the primary backbone (UltraVPR) achieves recall@10 ≥ 0.95 on the Derkachi normal segment against the pre-built corpus. **Traces to**: AC-2.1b **Description**: for each query frame in `flight_derkachi/normal_segment_60_stills/`, embed via UltraVPR; query the FAISS HNSW index; assert that the ground-truth tile (per recorded GPS + sector classification) is within the top-10 candidates ≥95% of frames. **Input data**: `flight_derkachi/normal_segment_60_stills/` + the Derkachi corpus FAISS index built by C10 (consumed read-only here). **Expected result**: recall@10 ≥ 0.95 for UltraVPR; recall@10 ≥ 0.85 for the mandatory simple-baseline NetVLAD (engine rule check). **Max execution time**: 90 s. --- ### C2-IT-02: VprResult schema invariants **Summary**: every `VprResult` carries `len(candidates) == k`, monotonically-non-decreasing `descriptor_distance`, and a non-empty `backbone_label`. **Traces to**: AC-2.2 (downstream-coverage prerequisite) **Description**: 100 frames through `retrieve_topk(k=10)`; assert (a) length, (b) sorted-ascending distance, (c) `backbone_label` non-empty. **Input data**: `tests/fixtures/synthetic_vpr/diverse_100f/`. **Expected result**: 100/100 results pass all invariants. **Max execution time**: 10 s. --- ### C2-IT-03: cache-poisoning seed rejection at retrieval **Summary**: when the corpus contains a poisoned tile injected during NFT-SEC-01 setup, the top-1 distance to that tile is bounded so downstream RANSAC + voting can reject it within the AC-NEW-7 relaxed budget. **Traces to**: AC-NEW-7 (component-level partition) **Description**: load the NFT-SEC-01 setup corpus (3-flight cumulative dataset with deflated covariance × 1.5–3); query each Derkachi frame; record the top-1 candidate for each. Pass if the poisoned tile is top-1 in fewer than the relaxed-CI threshold (the relaxation per AC-text 2026-05-09). **Input data**: NFT-SEC-01 setup corpus. **Expected result**: poisoned-tile top-1 rate within AC-NEW-7 relaxed CI. **Max execution time**: 5 min. --- ### C2-IT-04: scale-ratio invariance on satellite re-loc **Summary**: when the nav-camera scale changes (e.g., altitude shift), VPR top-K still surfaces the correct tile. **Traces to**: AC-8.6 (scale-ratio half) **Description**: synthetically scale the Derkachi normal-segment frames by ±20% (mimicking altitude variation); assert recall@10 stays ≥ 0.85 across the scale sweep. **Input data**: `flight_derkachi/scaled_+20%/` and `flight_derkachi/scaled_-20%/` (generated via deterministic resize). **Expected result**: recall@10 ≥ 0.85 at both scale extremes for UltraVPR. **Max execution time**: 90 s. --- ## Performance Tests ### C2-PT-01: backbone forward + HNSW lookup budget on Tier-2 **Traces to**: AC-4.1 **Load scenario**: 3 Hz frame rate, 10 min replay; corpus size 87 654 tiles (Derkachi area at 0.5 m/px). **Expected results**: | Metric | Target | Failure Threshold | |--------|--------|-------------------| | `embed_query` p95 | ≤ 60 ms (UltraVPR / FP16) | 100 ms | | `retrieve_topk(k=10)` p95 | ≤ 2 ms (HNSW) | 10 ms | | Combined p95 | ≤ 65 ms | 110 ms | **Resource limits**: - GPU memory: ≤ 600 MB resident for backbone weights. - System memory: ≤ 200 MB for the mmap'd FAISS index handle. --- ## Security Tests ### C2-ST-01: index-handle invalidation safety **Summary**: after C10 rebuilds the FAISS index (post-takeoff is FORBIDDEN, but unit-level safety check), the previous handle held by C2 must not silently return stale results. **Traces to**: defensive — no AC trace; backstops a code-injection / config-drift mode that AC-NEW-7 already covers at the suite level. **Test procedure**: 1. Build a tiny 100-tile FAISS index; mmap it through C2. 2. Replace the underlying file (simulating an out-of-band rebuild, which is FORBIDDEN at flight time per D-C10-3 but defended-in-depth here). 3. Call `retrieve_topk` and assert C2 raises `IndexUnavailableError` rather than returning stale candidates. **Pass criteria**: `IndexUnavailableError` raised; no candidates returned. **Fail criteria**: any candidate returned. --- ## Acceptance Tests C2 contributes to AC-2.1b / AC-8.6 via the suite-level FT scenarios. No additional C2-only acceptance tests. --- ## Test Data Management | Data Set | Description | Source | Size | |----------|-------------|--------|------| | `synthetic_vpr/diverse_100f/` | 100 diverse synthetic frames for invariant checks | generated, deterministic | ~50 MB | | `flight_derkachi/normal_segment_60_stills/` | shared with C1 | curated | shared | | `flight_derkachi/scaled_±20%/` | scale-ratio sweep | generated from above | ~40 MB | | Derkachi FAISS corpus | C10's output, consumed read-only | C10 build artifact | ~200 MB | **Setup procedure**: 1. C10 must have built the Derkachi FAISS index (this is a Step-5-test-side prereq; in the test runner, the corpus is staged from `tests/fixtures/cache_artifacts/`). 2. Synthetic + scaled fixtures generated by deterministic scripts. **Teardown**: corpus is read-only; nothing to clean up. **Data isolation**: each test gets its own `tests/tmp/c2//`.