add tests

gen_tests updated
solution.md updated
This commit is contained in:
Oleksandr Bezdieniezhnykh
2025-11-24 22:57:46 +02:00
parent f50006d100
commit 4f8c18a066
49 changed files with 7209 additions and 3 deletions
@@ -0,0 +1,136 @@
# Acceptance Test: Outlier Anchor Detection (<10%)
## Summary
Validate AC-5 requirement that the system detects and rejects outlier global anchor points (bad satellite matches from L3) keeping outlier rate below 10%.
## Linked Acceptance Criteria
**AC-5**: Less than 10% outlier anchors. The system can tolerate some incorrect global anchor points (bad satellite matches) but must keep them below 10% threshold through validation and rejection mechanisms.
## Preconditions
- ASTRAL-Next system operational
- L3 LiteSAM cross-view matching active
- Factor graph with robust M-estimation
- Validation mechanisms enabled (geometric consistency, residual analysis)
## Test Data
- **Dataset**: AD000001-AD000060 (60 images)
- **Expected Anchors**: ~20-30 global anchor attempts (not every frame needs L3)
- **Acceptable Outliers**: <3 outlier anchors (<10% of 30)
- **Challenge**: Potential satellite data staleness, seasonal differences
## Test Steps
### Step 1: Process Flight with L3 Anchoring
**Action**: Process full flight with L3 metric refinement active
**Expected Result**:
- L2 retrieves satellite tiles for keyframes
- L3 LiteSAM performs cross-view matching
- Global anchor factors added to factor graph
- Anchor count: 20-30 across 60 images
- Status: PROCESSING_WITH_ANCHORS
### Step 2: Monitor Anchor Quality Metrics
**Action**: Track L3 matching confidence and geometric consistency
**Expected Result**:
- Each anchor has confidence score (0-1)
- Each anchor has initial residual error
- Anchors with confidence <0.3 flagged as suspicious
- Anchors with residual >3σ flagged as outliers
- Status: MONITORING_QUALITY
### Step 3: Identify Potential Outlier Anchors
**Action**: Analyze anchors that conflict with trajectory consensus
**Expected Result**:
```
Total anchors: 25 (example)
High confidence (>0.7): 20
Medium confidence (0.4-0.7): 3
Low confidence (<0.4): 2
Flagged as outliers: 2 (<10%)
```
### Step 4: Validate Outlier Rejection Mechanism
**Action**: Verify factor graph handling of outlier anchors
**Expected Result**:
- Outlier anchors automatically down-weighted by robust kernel
- Outlier anchor residuals remain high (not dragging trajectory)
- Non-outlier anchors maintain weight ~1.0
- Factor graph converges despite outlier anchors present
- Status: OUTLIERS_HANDLED
### Step 5: Test Explicit Outlier Anchor Scenario
**Action**: Manually inject known bad anchor (simulated wrong satellite tile match)
**Expected Result**:
- Bad anchor creates large residual (>100m error)
- Geometric validation detects inconsistency
- Robust cost function down-weights bad anchor
- Bad anchor does NOT corrupt trajectory
- Status: SYNTHETIC_OUTLIER_REJECTED
### Step 6: Calculate Final Anchor Statistics
**Action**: Analyze all anchor attempts and outcomes
**Expected Result**:
```
Total anchor attempts: 25-30
Successful anchors: 23-27 (90-95%)
Outlier anchors: 2-3 (<10%)
Outlier detection rate: 100% (all caught)
False positive rate: <5% (good anchors not rejected)
Trajectory accuracy: Improved by valid anchors
AC-5 Status: PASS
```
## Pass/Fail Criteria
**PASS if**:
- Outlier anchor rate <10% of total anchor attempts
- All significant outliers (>100m error) detected and down-weighted
- Factor graph converges with MRE <1.5px
- Valid anchors improve trajectory accuracy vs L1-only
- No trajectory corruption from outlier anchors
**FAIL if**:
- Outlier anchor rate >10%
- >1 outlier anchor corrupts trajectory (causes >50m error propagation)
- Outlier detection fails (outliers not flagged)
- Factor graph diverges due to conflicting anchors
- Valid anchors incorrectly rejected (>10% false positive rate)
## Outlier Detection Mechanisms Tested
### Geometric Consistency Check
- Compare anchor position with L1 trajectory estimate
- Flag if discrepancy >100m
### Residual Analysis
- Monitor residual error in factor graph optimization
- Flag if residual >3σ from mean anchor residual
### Confidence Thresholding
- L3 LiteSAM outputs matching confidence
- Reject anchors with confidence <0.2
### Robust M-Estimation
- Cauchy/Huber kernel automatically down-weights high-residual anchors
- Prevents outliers from corrupting optimization
## Technical Validation Metrics
- **Anchor Attempt Rate**: 30-50% of frames (keyframes only)
- **Anchor Success Rate**: 90-95%
- **Outlier Rate**: <10% (AC-5 requirement)
- **Detection Sensitivity**: >95% (outliers caught)
- **Detection Specificity**: >90% (valid anchors retained)
## Failure Modes Tested
- **Wrong Satellite Tile**: L2 retrieves incorrect location
- **Stale Satellite Data**: Terrain changed significantly
- **Seasonal Mismatch**: Summer satellite vs winter UAV imagery
- **Rotation Error**: L3 estimates incorrect rotation
## Notes
- AC-5 is critical for hybrid localization reliability
- 10% outlier tolerance allows graceful degradation
- Robust M-estimation is the primary outlier defense
- Multiple validation layers provide defense-in-depth
- Valid anchors significantly improve absolute accuracy