Update autodev state, architecture documentation, and glossary terms

Transitioned the autodev state to phase 21, reflecting the completion of Step 5 and the drafting of Step 6 epics. Revised the architecture documentation to clarify the roles of the Tile Manager and its components, ensuring accurate representation of the system's operational flow. Updated glossary entries for Flight State and Operator to incorporate recent changes and enhance clarity on component interactions and responsibilities.
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-10 00:21:34 +03:00
parent 723f574b14
commit 64542d32fc
52 changed files with 8789 additions and 88 deletions
@@ -0,0 +1,144 @@
# Test Specification — C2 Visual Place Recognition
Component-scoped. Suite-level coverage is in `_docs/02_document/tests/*.md`; canonical traceability is `tests/traceability-matrix.md`.
## Acceptance Criteria Traceability
| AC ID | Acceptance Criterion (one-line) | Test IDs | Coverage |
|-------|---------------------------------|----------|----------|
| AC-2.1b | Satellite-anchor registration meets AC-1.1/1.2/2.2/8.2/8.6 | FT-P-05, FT-P-19, **C2-IT-01** | Covered |
| AC-2.2 (cross-domain portion) | MRE <2.5 px cross-domain | FT-P-06, **C2-IT-02** (recall floor only — MRE owned by C3) | Covered |
| AC-4.1 | E2E latency <400 ms p95 | NFT-PERF-01, **C2-PT-01** | Covered |
| AC-NEW-7 | Cache poisoning safety budget | NFT-SEC-01 (onboard-side), **C2-IT-03** | Covered (relaxed) |
| AC-8.6 (scale-ratio portion) | Scale-ratio satellite relocalization | FT-P-19, **C2-IT-04** | Covered |
---
## Component-Internal Tests
### C2-IT-01: top-K=10 recall at p=10 on Derkachi
**Summary**: the primary backbone (UltraVPR) achieves recall@10 ≥ 0.95 on the Derkachi normal segment against the pre-built corpus.
**Traces to**: AC-2.1b
**Description**: for each query frame in `flight_derkachi/normal_segment_60_stills/`, embed via UltraVPR; query the FAISS HNSW index; assert that the ground-truth tile (per recorded GPS + sector classification) is within the top-10 candidates ≥95% of frames.
**Input data**: `flight_derkachi/normal_segment_60_stills/` + the Derkachi corpus FAISS index built by C10 (consumed read-only here).
**Expected result**: recall@10 ≥ 0.95 for UltraVPR; recall@10 ≥ 0.85 for the mandatory simple-baseline NetVLAD (engine rule check).
**Max execution time**: 90 s.
---
### C2-IT-02: VprResult schema invariants
**Summary**: every `VprResult` carries `len(candidates) == k`, monotonically-non-decreasing `descriptor_distance`, and a non-empty `backbone_label`.
**Traces to**: AC-2.2 (downstream-coverage prerequisite)
**Description**: 100 frames through `retrieve_topk(k=10)`; assert (a) length, (b) sorted-ascending distance, (c) `backbone_label` non-empty.
**Input data**: `tests/fixtures/synthetic_vpr/diverse_100f/`.
**Expected result**: 100/100 results pass all invariants.
**Max execution time**: 10 s.
---
### C2-IT-03: cache-poisoning seed rejection at retrieval
**Summary**: when the corpus contains a poisoned tile injected during NFT-SEC-01 setup, the top-1 distance to that tile is bounded so downstream RANSAC + voting can reject it within the AC-NEW-7 relaxed budget.
**Traces to**: AC-NEW-7 (component-level partition)
**Description**: load the NFT-SEC-01 setup corpus (3-flight cumulative dataset with deflated covariance × 1.53); query each Derkachi frame; record the top-1 candidate for each. Pass if the poisoned tile is top-1 in fewer than the relaxed-CI threshold (the relaxation per AC-text 2026-05-09).
**Input data**: NFT-SEC-01 setup corpus.
**Expected result**: poisoned-tile top-1 rate within AC-NEW-7 relaxed CI.
**Max execution time**: 5 min.
---
### C2-IT-04: scale-ratio invariance on satellite re-loc
**Summary**: when the nav-camera scale changes (e.g., altitude shift), VPR top-K still surfaces the correct tile.
**Traces to**: AC-8.6 (scale-ratio half)
**Description**: synthetically scale the Derkachi normal-segment frames by ±20% (mimicking altitude variation); assert recall@10 stays ≥ 0.85 across the scale sweep.
**Input data**: `flight_derkachi/scaled_+20%/` and `flight_derkachi/scaled_-20%/` (generated via deterministic resize).
**Expected result**: recall@10 ≥ 0.85 at both scale extremes for UltraVPR.
**Max execution time**: 90 s.
---
## Performance Tests
### C2-PT-01: backbone forward + HNSW lookup budget on Tier-2
**Traces to**: AC-4.1
**Load scenario**: 3 Hz frame rate, 10 min replay; corpus size 87 654 tiles (Derkachi area at 0.5 m/px).
**Expected results**:
| Metric | Target | Failure Threshold |
|--------|--------|-------------------|
| `embed_query` p95 | ≤ 60 ms (UltraVPR / FP16) | 100 ms |
| `retrieve_topk(k=10)` p95 | ≤ 2 ms (HNSW) | 10 ms |
| Combined p95 | ≤ 65 ms | 110 ms |
**Resource limits**:
- GPU memory: ≤ 600 MB resident for backbone weights.
- System memory: ≤ 200 MB for the mmap'd FAISS index handle.
---
## Security Tests
### C2-ST-01: index-handle invalidation safety
**Summary**: after C10 rebuilds the FAISS index (post-takeoff is FORBIDDEN, but unit-level safety check), the previous handle held by C2 must not silently return stale results.
**Traces to**: defensive — no AC trace; backstops a code-injection / config-drift mode that AC-NEW-7 already covers at the suite level.
**Test procedure**:
1. Build a tiny 100-tile FAISS index; mmap it through C2.
2. Replace the underlying file (simulating an out-of-band rebuild, which is FORBIDDEN at flight time per D-C10-3 but defended-in-depth here).
3. Call `retrieve_topk` and assert C2 raises `IndexUnavailableError` rather than returning stale candidates.
**Pass criteria**: `IndexUnavailableError` raised; no candidates returned.
**Fail criteria**: any candidate returned.
---
## Acceptance Tests
C2 contributes to AC-2.1b / AC-8.6 via the suite-level FT scenarios. No additional C2-only acceptance tests.
---
## Test Data Management
| Data Set | Description | Source | Size |
|----------|-------------|--------|------|
| `synthetic_vpr/diverse_100f/` | 100 diverse synthetic frames for invariant checks | generated, deterministic | ~50 MB |
| `flight_derkachi/normal_segment_60_stills/` | shared with C1 | curated | shared |
| `flight_derkachi/scaled_±20%/` | scale-ratio sweep | generated from above | ~40 MB |
| Derkachi FAISS corpus | C10's output, consumed read-only | C10 build artifact | ~200 MB |
**Setup procedure**:
1. C10 must have built the Derkachi FAISS index (this is a Step-5-test-side prereq; in the test runner, the corpus is staged from `tests/fixtures/cache_artifacts/`).
2. Synthetic + scaled fixtures generated by deterministic scripts.
**Teardown**: corpus is read-only; nothing to clean up.
**Data isolation**: each test gets its own `tests/tmp/c2/<test-id>/`.