Files
Oleksandr Bezdieniezhnykh 64542d32fc Update autodev state, architecture documentation, and glossary terms
Transitioned the autodev state to phase 21, reflecting the completion of Step 5 and the drafting of Step 6 epics. Revised the architecture documentation to clarify the roles of the Tile Manager and its components, ensuring accurate representation of the system's operational flow. Updated glossary entries for Flight State and Operator to incorporate recent changes and enhance clarity on component interactions and responsibilities.
2026-05-10 00:21:34 +03:00

6.0 KiB
Raw Permalink Blame History

Test Specification — C2 Visual Place Recognition

Component-scoped. Suite-level coverage is in _docs/02_document/tests/*.md; canonical traceability is tests/traceability-matrix.md.

Acceptance Criteria Traceability

AC ID Acceptance Criterion (one-line) Test IDs Coverage
AC-2.1b Satellite-anchor registration meets AC-1.1/1.2/2.2/8.2/8.6 FT-P-05, FT-P-19, C2-IT-01 Covered
AC-2.2 (cross-domain portion) MRE <2.5 px cross-domain FT-P-06, C2-IT-02 (recall floor only — MRE owned by C3) Covered
AC-4.1 E2E latency <400 ms p95 NFT-PERF-01, C2-PT-01 Covered
AC-NEW-7 Cache poisoning safety budget NFT-SEC-01 (onboard-side), C2-IT-03 Covered (relaxed)
AC-8.6 (scale-ratio portion) Scale-ratio satellite relocalization FT-P-19, C2-IT-04 Covered

Component-Internal Tests

C2-IT-01: top-K=10 recall at p=10 on Derkachi

Summary: the primary backbone (UltraVPR) achieves recall@10 ≥ 0.95 on the Derkachi normal segment against the pre-built corpus.

Traces to: AC-2.1b

Description: for each query frame in flight_derkachi/normal_segment_60_stills/, embed via UltraVPR; query the FAISS HNSW index; assert that the ground-truth tile (per recorded GPS + sector classification) is within the top-10 candidates ≥95% of frames.

Input data: flight_derkachi/normal_segment_60_stills/ + the Derkachi corpus FAISS index built by C10 (consumed read-only here).

Expected result: recall@10 ≥ 0.95 for UltraVPR; recall@10 ≥ 0.85 for the mandatory simple-baseline NetVLAD (engine rule check).

Max execution time: 90 s.


C2-IT-02: VprResult schema invariants

Summary: every VprResult carries len(candidates) == k, monotonically-non-decreasing descriptor_distance, and a non-empty backbone_label.

Traces to: AC-2.2 (downstream-coverage prerequisite)

Description: 100 frames through retrieve_topk(k=10); assert (a) length, (b) sorted-ascending distance, (c) backbone_label non-empty.

Input data: tests/fixtures/synthetic_vpr/diverse_100f/.

Expected result: 100/100 results pass all invariants.

Max execution time: 10 s.


C2-IT-03: cache-poisoning seed rejection at retrieval

Summary: when the corpus contains a poisoned tile injected during NFT-SEC-01 setup, the top-1 distance to that tile is bounded so downstream RANSAC + voting can reject it within the AC-NEW-7 relaxed budget.

Traces to: AC-NEW-7 (component-level partition)

Description: load the NFT-SEC-01 setup corpus (3-flight cumulative dataset with deflated covariance × 1.53); query each Derkachi frame; record the top-1 candidate for each. Pass if the poisoned tile is top-1 in fewer than the relaxed-CI threshold (the relaxation per AC-text 2026-05-09).

Input data: NFT-SEC-01 setup corpus.

Expected result: poisoned-tile top-1 rate within AC-NEW-7 relaxed CI.

Max execution time: 5 min.


C2-IT-04: scale-ratio invariance on satellite re-loc

Summary: when the nav-camera scale changes (e.g., altitude shift), VPR top-K still surfaces the correct tile.

Traces to: AC-8.6 (scale-ratio half)

Description: synthetically scale the Derkachi normal-segment frames by ±20% (mimicking altitude variation); assert recall@10 stays ≥ 0.85 across the scale sweep.

Input data: flight_derkachi/scaled_+20%/ and flight_derkachi/scaled_-20%/ (generated via deterministic resize).

Expected result: recall@10 ≥ 0.85 at both scale extremes for UltraVPR.

Max execution time: 90 s.


Performance Tests

C2-PT-01: backbone forward + HNSW lookup budget on Tier-2

Traces to: AC-4.1

Load scenario: 3 Hz frame rate, 10 min replay; corpus size 87 654 tiles (Derkachi area at 0.5 m/px).

Expected results:

Metric Target Failure Threshold
embed_query p95 ≤ 60 ms (UltraVPR / FP16) 100 ms
retrieve_topk(k=10) p95 ≤ 2 ms (HNSW) 10 ms
Combined p95 ≤ 65 ms 110 ms

Resource limits:

  • GPU memory: ≤ 600 MB resident for backbone weights.
  • System memory: ≤ 200 MB for the mmap'd FAISS index handle.

Security Tests

C2-ST-01: index-handle invalidation safety

Summary: after C10 rebuilds the FAISS index (post-takeoff is FORBIDDEN, but unit-level safety check), the previous handle held by C2 must not silently return stale results.

Traces to: defensive — no AC trace; backstops a code-injection / config-drift mode that AC-NEW-7 already covers at the suite level.

Test procedure:

  1. Build a tiny 100-tile FAISS index; mmap it through C2.
  2. Replace the underlying file (simulating an out-of-band rebuild, which is FORBIDDEN at flight time per D-C10-3 but defended-in-depth here).
  3. Call retrieve_topk and assert C2 raises IndexUnavailableError rather than returning stale candidates.

Pass criteria: IndexUnavailableError raised; no candidates returned. Fail criteria: any candidate returned.


Acceptance Tests

C2 contributes to AC-2.1b / AC-8.6 via the suite-level FT scenarios. No additional C2-only acceptance tests.


Test Data Management

Data Set Description Source Size
synthetic_vpr/diverse_100f/ 100 diverse synthetic frames for invariant checks generated, deterministic ~50 MB
flight_derkachi/normal_segment_60_stills/ shared with C1 curated shared
flight_derkachi/scaled_±20%/ scale-ratio sweep generated from above ~40 MB
Derkachi FAISS corpus C10's output, consumed read-only C10 build artifact ~200 MB

Setup procedure:

  1. C10 must have built the Derkachi FAISS index (this is a Step-5-test-side prereq; in the test runner, the corpus is staged from tests/fixtures/cache_artifacts/).
  2. Synthetic + scaled fixtures generated by deterministic scripts.

Teardown: corpus is read-only; nothing to clean up.

Data isolation: each test gets its own tests/tmp/c2/<test-id>/.