add tests

gen_tests updated solution.md updated
2026-04-23 02:56:38 +00:00 · 2025-11-24 22:57:46 +02:00
parent f50006d100
commit 4f8c18a066
49 changed files with 7209 additions and 3 deletions
@@ -0,0 +1,136 @@
+# Acceptance Test: Outlier Anchor Detection (<10%)
+
+## Summary
+Validate AC-5 requirement that the system detects and rejects outlier global anchor points (bad satellite matches from L3) keeping outlier rate below 10%.
+
+## Linked Acceptance Criteria
+**AC-5**: Less than 10% outlier anchors. The system can tolerate some incorrect global anchor points (bad satellite matches) but must keep them below 10% threshold through validation and rejection mechanisms.
+
+## Preconditions
+- ASTRAL-Next system operational
+- L3 LiteSAM cross-view matching active
+- Factor graph with robust M-estimation
+- Validation mechanisms enabled (geometric consistency, residual analysis)
+
+## Test Data
+- **Dataset**: AD000001-AD000060 (60 images)
+- **Expected Anchors**: ~20-30 global anchor attempts (not every frame needs L3)
+- **Acceptable Outliers**: <3 outlier anchors (<10% of 30)
+- **Challenge**: Potential satellite data staleness, seasonal differences
+
+## Test Steps
+
+### Step 1: Process Flight with L3 Anchoring
+**Action**: Process full flight with L3 metric refinement active
+**Expected Result**:
+- L2 retrieves satellite tiles for keyframes
+- L3 LiteSAM performs cross-view matching
+- Global anchor factors added to factor graph
+- Anchor count: 20-30 across 60 images
+- Status: PROCESSING_WITH_ANCHORS
+
+### Step 2: Monitor Anchor Quality Metrics
+**Action**: Track L3 matching confidence and geometric consistency
+**Expected Result**:
+- Each anchor has confidence score (0-1)
+- Each anchor has initial residual error
+- Anchors with confidence <0.3 flagged as suspicious
+- Anchors with residual >3σ flagged as outliers
+- Status: MONITORING_QUALITY
+
+### Step 3: Identify Potential Outlier Anchors
+**Action**: Analyze anchors that conflict with trajectory consensus
+**Expected Result**:
+```
+Total anchors: 25 (example)
+High confidence (>0.7): 20
+Medium confidence (0.4-0.7): 3
+Low confidence (<0.4): 2
+Flagged as outliers: 2 (<10%)
+```
+
+### Step 4: Validate Outlier Rejection Mechanism
+**Action**: Verify factor graph handling of outlier anchors
+**Expected Result**:
+- Outlier anchors automatically down-weighted by robust kernel
+- Outlier anchor residuals remain high (not dragging trajectory)
+- Non-outlier anchors maintain weight ~1.0
+- Factor graph converges despite outlier anchors present
+- Status: OUTLIERS_HANDLED
+
+### Step 5: Test Explicit Outlier Anchor Scenario
+**Action**: Manually inject known bad anchor (simulated wrong satellite tile match)
+**Expected Result**:
+- Bad anchor creates large residual (>100m error)
+- Geometric validation detects inconsistency
+- Robust cost function down-weights bad anchor
+- Bad anchor does NOT corrupt trajectory
+- Status: SYNTHETIC_OUTLIER_REJECTED
+
+### Step 6: Calculate Final Anchor Statistics
+**Action**: Analyze all anchor attempts and outcomes
+**Expected Result**:
+```
+Total anchor attempts: 25-30
+Successful anchors: 23-27 (90-95%)
+Outlier anchors: 2-3 (<10%)
+Outlier detection rate: 100% (all caught)
+False positive rate: <5% (good anchors not rejected)
+Trajectory accuracy: Improved by valid anchors
+AC-5 Status: PASS
+```
+
+## Pass/Fail Criteria
+
+**PASS if**:
+- Outlier anchor rate <10% of total anchor attempts
+- All significant outliers (>100m error) detected and down-weighted
+- Factor graph converges with MRE <1.5px
+- Valid anchors improve trajectory accuracy vs L1-only
+- No trajectory corruption from outlier anchors
+
+**FAIL if**:
+- Outlier anchor rate >10%
+- >1 outlier anchor corrupts trajectory (causes >50m error propagation)
+- Outlier detection fails (outliers not flagged)
+- Factor graph diverges due to conflicting anchors
+- Valid anchors incorrectly rejected (>10% false positive rate)
+
+## Outlier Detection Mechanisms Tested
+
+### Geometric Consistency Check
+- Compare anchor position with L1 trajectory estimate
+- Flag if discrepancy >100m
+
+### Residual Analysis
+- Monitor residual error in factor graph optimization
+- Flag if residual >3σ from mean anchor residual
+
+### Confidence Thresholding
+- L3 LiteSAM outputs matching confidence
+- Reject anchors with confidence <0.2
+
+### Robust M-Estimation
+- Cauchy/Huber kernel automatically down-weights high-residual anchors
+- Prevents outliers from corrupting optimization
+
+## Technical Validation Metrics
+- **Anchor Attempt Rate**: 30-50% of frames (keyframes only)
+- **Anchor Success Rate**: 90-95%
+- **Outlier Rate**: <10% (AC-5 requirement)
+- **Detection Sensitivity**: >95% (outliers caught)
+- **Detection Specificity**: >90% (valid anchors retained)
+
+## Failure Modes Tested
+- **Wrong Satellite Tile**: L2 retrieves incorrect location
+- **Stale Satellite Data**: Terrain changed significantly
+- **Seasonal Mismatch**: Summer satellite vs winter UAV imagery
+- **Rotation Error**: L3 estimates incorrect rotation
+
+## Notes
+- AC-5 is critical for hybrid localization reliability
+- 10% outlier tolerance allows graceful degradation
+- Robust M-estimation is the primary outlier defense
+- Multiple validation layers provide defense-in-depth
+- Valid anchors significantly improve absolute accuracy
+