# Test Specification — C2 Visual Place Recognition

Component-scoped. Suite-level coverage is in `_docs/02_document/tests/*.md`; canonical traceability is `tests/traceability-matrix.md`.

## Acceptance Criteria Traceability

| AC ID | Acceptance Criterion (one-line) | Test IDs | Coverage |
|-------|---------------------------------|----------|----------|
| AC-2.1b | Satellite-anchor registration meets AC-1.1/1.2/2.2/8.2/8.6 | FT-P-05, FT-P-19, **C2-IT-01** | Covered |
| AC-2.2 (cross-domain portion) | MRE <2.5 px cross-domain | FT-P-06, **C2-IT-02** (recall floor only — MRE owned by C3) | Covered |
| AC-4.1 | E2E latency <400 ms p95 | NFT-PERF-01, **C2-PT-01** | Covered |
| AC-NEW-7 | Cache poisoning safety budget | NFT-SEC-01 (onboard-side), **C2-IT-03** | Covered (relaxed) |
| AC-8.6 (scale-ratio portion) | Scale-ratio satellite relocalization | FT-P-19, **C2-IT-04** | Covered |

---

## Component-Internal Tests

### C2-IT-01: top-K=10 recall at p=10 on Derkachi

**Summary**: the primary backbone (UltraVPR) achieves recall@10 ≥ 0.95 on the Derkachi normal segment against the pre-built corpus.

**Traces to**: AC-2.1b

**Description**: for each query frame in `flight_derkachi/normal_segment_60_stills/`, embed via UltraVPR; query the FAISS HNSW index; assert that the ground-truth tile (per recorded GPS + sector classification) is within the top-10 candidates ≥95% of frames.

**Input data**: `flight_derkachi/normal_segment_60_stills/` + the Derkachi corpus FAISS index built by C10 (consumed read-only here).

**Expected result**: recall@10 ≥ 0.95 for UltraVPR; recall@10 ≥ 0.85 for the mandatory simple-baseline NetVLAD (engine rule check).

**Max execution time**: 90 s.

---

### C2-IT-02: VprResult schema invariants

**Summary**: every `VprResult` carries `len(candidates) == k`, monotonically-non-decreasing `descriptor_distance`, and a non-empty `backbone_label`.

**Traces to**: AC-2.2 (downstream-coverage prerequisite)

**Description**: 100 frames through `retrieve_topk(k=10)`; assert (a) length, (b) sorted-ascending distance, (c) `backbone_label` non-empty.

**Input data**: `tests/fixtures/synthetic_vpr/diverse_100f/`.

**Expected result**: 100/100 results pass all invariants.

**Max execution time**: 10 s.

---

### C2-IT-03: cache-poisoning seed rejection at retrieval

**Summary**: when the corpus contains a poisoned tile injected during NFT-SEC-01 setup, the top-1 distance to that tile is bounded so downstream RANSAC + voting can reject it within the AC-NEW-7 relaxed budget.

**Traces to**: AC-NEW-7 (component-level partition)

**Description**: load the NFT-SEC-01 setup corpus (3-flight cumulative dataset with deflated covariance × 1.5–3); query each Derkachi frame; record the top-1 candidate for each. Pass if the poisoned tile is top-1 in fewer than the relaxed-CI threshold (the relaxation per AC-text 2026-05-09).

**Input data**: NFT-SEC-01 setup corpus.

**Expected result**: poisoned-tile top-1 rate within AC-NEW-7 relaxed CI.

**Max execution time**: 5 min.

---

### C2-IT-04: scale-ratio invariance on satellite re-loc

**Summary**: when the nav-camera scale changes (e.g., altitude shift), VPR top-K still surfaces the correct tile.

**Traces to**: AC-8.6 (scale-ratio half)

**Description**: synthetically scale the Derkachi normal-segment frames by ±20% (mimicking altitude variation); assert recall@10 stays ≥ 0.85 across the scale sweep.

**Input data**: `flight_derkachi/scaled_+20%/` and `flight_derkachi/scaled_-20%/` (generated via deterministic resize).

**Expected result**: recall@10 ≥ 0.85 at both scale extremes for UltraVPR.

**Max execution time**: 90 s.

---

## Performance Tests

### C2-PT-01: backbone forward + HNSW lookup budget on Tier-2

**Traces to**: AC-4.1

**Load scenario**: 3 Hz frame rate, 10 min replay; corpus size 87 654 tiles (Derkachi area at 0.5 m/px).

**Expected results**:

| Metric | Target | Failure Threshold |
|--------|--------|-------------------|
| `embed_query` p95 | ≤ 60 ms (UltraVPR / FP16) | 100 ms |
| `retrieve_topk(k=10)` p95 | ≤ 2 ms (HNSW) | 10 ms |
| Combined p95 | ≤ 65 ms | 110 ms |

**Resource limits**:
- GPU memory: ≤ 600 MB resident for backbone weights.
- System memory: ≤ 200 MB for the mmap'd FAISS index handle.

---

## Security Tests

### C2-ST-01: index-handle invalidation safety

**Summary**: after C10 rebuilds the FAISS index (post-takeoff is FORBIDDEN, but unit-level safety check), the previous handle held by C2 must not silently return stale results.

**Traces to**: defensive — no AC trace; backstops a code-injection / config-drift mode that AC-NEW-7 already covers at the suite level.

**Test procedure**:
1. Build a tiny 100-tile FAISS index; mmap it through C2.
2. Replace the underlying file (simulating an out-of-band rebuild, which is FORBIDDEN at flight time per D-C10-3 but defended-in-depth here).
3. Call `retrieve_topk` and assert C2 raises `IndexUnavailableError` rather than returning stale candidates.

**Pass criteria**: `IndexUnavailableError` raised; no candidates returned.
**Fail criteria**: any candidate returned.

---

## Acceptance Tests

C2 contributes to AC-2.1b / AC-8.6 via the suite-level FT scenarios. No additional C2-only acceptance tests.

---

## Test Data Management

| Data Set | Description | Source | Size |
|----------|-------------|--------|------|
| `synthetic_vpr/diverse_100f/` | 100 diverse synthetic frames for invariant checks | generated, deterministic | ~50 MB |
| `flight_derkachi/normal_segment_60_stills/` | shared with C1 | curated | shared |
| `flight_derkachi/scaled_±20%/` | scale-ratio sweep | generated from above | ~40 MB |
| Derkachi FAISS corpus | C10's output, consumed read-only | C10 build artifact | ~200 MB |

**Setup procedure**:
1. C10 must have built the Derkachi FAISS index (this is a Step-5-test-side prereq; in the test runner, the corpus is staged from `tests/fixtures/cache_artifacts/`).
2. Synthetic + scaled fixtures generated by deterministic scripts.

**Teardown**: corpus is read-only; nothing to clean up.

**Data isolation**: each test gets its own `tests/tmp/c2/<test-id>/`.