Update autodev state, architecture documentation, and glossary terms

Transitioned the autodev state to phase 21, reflecting the completion of Step 5 and the drafting of Step 6 epics. Revised the architecture documentation to clarify the roles of the Tile Manager and its components, ensuring accurate representation of the system's operational flow. Updated glossary entries for Flight State and Operator to incorporate recent changes and enhance clarity on component interactions and responsibilities.
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-10 00:21:34 +03:00
parent 723f574b14
commit 64542d32fc
52 changed files with 8789 additions and 88 deletions
@@ -0,0 +1,144 @@
# Test Specification — C3 Cross-Domain Matcher
Component-scoped. Suite-level coverage in `_docs/02_document/tests/*.md`.
## Acceptance Criteria Traceability
| AC ID | Acceptance Criterion (one-line) | Test IDs | Coverage |
|-------|---------------------------------|----------|----------|
| AC-1.1 | Frame-center GPS within 50 m for ≥80% of normal-flight photos | FT-P-01, **C3-IT-01** (inlier-budget partition) | Covered |
| AC-2.1b | Satellite-anchor registration | FT-P-05, **C3-IT-02** | Covered |
| AC-2.2 (cross-domain portion) | MRE <2.5 px cross-domain | FT-P-06, **C3-IT-03** | Covered |
| AC-3.1 | Tolerate up to 350 m outliers, tilt ±20° | FT-N-01, **C3-IT-04** | Covered |
| AC-4.1 | E2E latency <400 ms p95 | NFT-PERF-01, **C3-PT-01** | Covered |
---
## Component-Internal Tests
### C3-IT-01: per-candidate inlier-count floor on Derkachi
**Summary**: on the Derkachi normal segment, the best candidate's RANSAC inlier count is at least the configured floor (default ≥ 80 inliers per frame).
**Traces to**: AC-1.1 (component-level partition feeding AC-1.1's accuracy budget)
**Description**: for the Derkachi normal-segment fixture, run C3 with the production-default DISK+LightGlue backbone; record `MatchResult.per_candidate[best_candidate_idx].inlier_count`; assert p5 ≥ 80 (i.e., ≥95% of frames clear the floor).
**Input data**: `flight_derkachi/normal_segment_60_stills/` + C10-built tile descriptors (read-only).
**Expected result**: p5 inlier count ≥ 80 for DISK+LightGlue.
**Max execution time**: 4 min on Tier-1 (CPU fallback) / 90 s on Tier-2.
---
### C3-IT-02: best-candidate selection determinism
**Summary**: `best_candidate_idx == argmax(inlier_count)` for every emitted `MatchResult`.
**Traces to**: AC-2.1b
**Description**: 100 frames through `match`; assert (a) `best_candidate_idx` equals the index of the largest `inlier_count` in `per_candidate`, (b) ties are broken deterministically (lowest tile_id wins; check against a known-tie synthetic fixture).
**Input data**: `synthetic_matcher/known_tie_10f/` (synthetic frames where two candidates have identical inlier counts).
**Expected result**: deterministic tie-breaking; no `best_candidate_idx` mismatch in 100/100 frames.
**Max execution time**: 60 s.
---
### C3-IT-03: cross-domain MRE bound
**Summary**: median per-frame reprojection residual stays under 2.5 px for the production-default matcher on the Derkachi normal segment.
**Traces to**: AC-2.2
**Description**: same Derkachi fixture; record `MatchResult.reprojection_residual_px`; assert p95 < 2.5 px.
**Input data**: as C3-IT-01.
**Expected result**: p95 < 2.5 px for DISK+LightGlue. ALIKED+LightGlue (secondary) and XFeat (alternate) tested on a smoke subset only — comparative-study verdict belongs to IT-12.
**Max execution time**: 4 min.
---
### C3-IT-04: tilt + outlier robustness
**Summary**: under ±20° tilt and synthetic 350 m outliers, the matcher still produces a usable inlier set (≥40 inliers).
**Traces to**: AC-3.1
**Description**: synthetically tilt the Derkachi frames by {20°, 10°, 0°, +10°, +20°}; inject 350 m position outliers into the candidate tile metadata for 5% of frames; assert C3 emits a `MatchResult` with `best_candidate.inlier_count ≥ 40` for ≥90% of frames in each tilt bucket.
**Input data**: `flight_derkachi/tilted_±20°/` (deterministic synthetic tilt).
**Expected result**: per-bucket inlier-count p10 ≥ 40 for DISK+LightGlue.
**Max execution time**: 6 min.
---
### C3-IT-05: `InsufficientInliersError` propagation
**Summary**: when all N=3 candidates fall below the inlier floor, C3 raises `InsufficientInliersError` and emits no `MatchResult`.
**Traces to**: AC-3.5 (defensive — keeps the spoof-fallback path clean)
**Description**: synthetic frames with deliberately mismatched candidate tiles (cross-region pulls); assert `match` raises `InsufficientInliersError` for every frame and the downstream consumer (C3.5 / C4) receives no `MatchResult`.
**Input data**: `synthetic_matcher/cross_region_mismatch_20f/`.
**Expected result**: 20/20 frames raise the error.
**Max execution time**: 60 s.
---
## Performance Tests
### C3-PT-01: per-frame match latency on Tier-2 (dominant cost)
**Traces to**: AC-4.1 (C3 owns the largest single partition of the budget)
**Load scenario**: 3 Hz, N=3 candidates, 10 min replay; concurrent C2 backbone + C5 iSAM2 update on the same Jetson.
**Expected results**:
| Metric | Target | Failure Threshold |
|--------|--------|-------------------|
| `match` p95 | ≤ 180 ms (DISK+LightGlue) | 280 ms |
| Per-candidate p95 | ≤ 60 ms | 95 ms |
| Throughput | ≥ 3 Hz sustained | < 2.5 Hz |
**Resource limits**:
- GPU memory: ≤ 800 MB for backbone + matcher engines combined.
---
## Security Tests
C3 has no externally-reachable surface; defensive coverage at the cache-poisoning level (NFT-SEC-01) and via shared-runtime invariants (C2.5-IT-03).
---
## Acceptance Tests
Covered transitively via FT-P-01, FT-P-05, FT-P-06.
---
## Test Data Management
| Data Set | Source | Size |
|----------|--------|------|
| `flight_derkachi/normal_segment_60_stills/` | shared | shared |
| `flight_derkachi/tilted_±20°/` | generated | ~150 MB |
| `synthetic_matcher/known_tie_10f/` | generated | ~5 MB |
| `synthetic_matcher/cross_region_mismatch_20f/` | generated | ~10 MB |
| C10-built tile descriptors | C10 artifact | shared |
**Setup**: C10 must have built tile descriptors; matchers' TRT engines must be compiled (consumes ~5 min on Tier-2 first run; cached after).
**Teardown**: read-only.
**Data isolation**: per-test temp dirs.