add chunking

This commit is contained in:
Oleksandr Bezdieniezhnykh
2025-11-27 03:43:19 +02:00
parent 4f8c18a066
commit 2037870f67
43 changed files with 7041 additions and 4135 deletions
@@ -0,0 +1,456 @@
# Metric Refinement
## Interface Definition
**Interface Name**: `IMetricRefinement`
### Interface Methods
```python
class IMetricRefinement(ABC):
@abstractmethod
def align_to_satellite(self, uav_image: np.ndarray, satellite_tile: np.ndarray, tile_bounds: TileBounds) -> Optional[AlignmentResult]:
pass
@abstractmethod
def compute_homography(self, uav_image: np.ndarray, satellite_tile: np.ndarray) -> Optional[np.ndarray]:
pass
@abstractmethod
def extract_gps_from_alignment(self, homography: np.ndarray, tile_bounds: TileBounds, image_center: Tuple[int, int]) -> GPSPoint:
pass
@abstractmethod
def compute_match_confidence(self, alignment: AlignmentResult) -> float:
pass
@abstractmethod
def align_chunk_to_satellite(self, chunk_images: List[np.ndarray], satellite_tile: np.ndarray, tile_bounds: TileBounds) -> Optional[ChunkAlignmentResult]:
pass
@abstractmethod
def match_chunk_homography(self, chunk_images: List[np.ndarray], satellite_tile: np.ndarray) -> Optional[np.ndarray]:
pass
```
## Component Description
### Responsibilities
- LiteSAM for precise UAV-to-satellite cross-view matching
- **Requires pre-rotated images** from Image Rotation Manager
- Compute homography mapping UAV image to satellite tile
- Extract absolute GPS coordinates from alignment
- Process against single tile (drift correction) or tile grid (progressive search)
- Achieve <20m accuracy requirement
- **Chunk-to-satellite matching (more robust than single-image)**
- **Chunk homography computation**
### Scope
- Cross-view geo-localization (UAV↔satellite)
- Handles altitude variations (<1km)
- Multi-scale processing for different GSDs
- Domain gap (UAV downward vs satellite nadir view)
- **Critical**: Fails if rotation >45° (handled by G06)
- **Chunk-level matching (aggregate correspondences from multiple images)**
## API Methods
### `align_to_satellite(uav_image: np.ndarray, satellite_tile: np.ndarray, tile_bounds: TileBounds) -> Optional[AlignmentResult]`
**Description**: Aligns UAV image to satellite tile, returning GPS location.
**Called By**:
- F06 Image Rotation Manager (during rotation sweep)
- F11 Failure Recovery Coordinator (progressive search)
- F02 Flight Processor (drift correction with single tile)
**Input**:
```python
uav_image: np.ndarray # Pre-rotated UAV image
satellite_tile: np.ndarray # Reference satellite tile
tile_bounds: TileBounds # GPS bounds and GSD of the satellite tile
```
**Output**:
```python
AlignmentResult:
matched: bool
homography: np.ndarray # 3×3 transformation matrix
gps_center: GPSPoint # UAV image center GPS
confidence: float
inlier_count: int
total_correspondences: int
```
**Processing Flow**:
1. Extract features from both images using LiteSAM encoder
2. Compute dense correspondence field
3. Estimate homography from correspondences
4. Validate match quality (inlier count, reprojection error)
5. If valid match:
- Extract GPS from homography using tile_bounds
- Return AlignmentResult
6. If no match:
- Return None
**Match Criteria**:
- **Good match**: inlier_count > 30, confidence > 0.7
- **Weak match**: inlier_count 15-30, confidence 0.5-0.7
- **No match**: inlier_count < 15
**Error Conditions**:
- Returns `None`: No match found, rotation >45° (should be pre-rotated)
**Test Cases**:
1. **Good alignment**: Returns GPS within 20m of ground truth
2. **Altitude variation**: Handles GSD mismatch
3. **Rotation >45°**: Fails (by design, requires pre-rotation)
4. **Multi-scale**: Processes at multiple scales
---
### `compute_homography(uav_image: np.ndarray, satellite_tile: np.ndarray) -> Optional[np.ndarray]`
**Description**: Computes homography transformation from UAV to satellite.
**Called By**:
- Internal (during align_to_satellite)
**Input**:
```python
uav_image: np.ndarray
satellite_tile: np.ndarray
```
**Output**:
```python
Optional[np.ndarray]: 3×3 homography matrix or None
```
**Algorithm (LiteSAM)**:
1. Extract multi-scale features using TAIFormer
2. Compute correlation via Convolutional Token Mixer (CTM)
3. Generate dense correspondences
4. Estimate homography using RANSAC
5. Refine with non-linear optimization
**Homography Properties**:
- Maps pixels from UAV image to satellite image
- Accounts for: scale, rotation, perspective
- 8 DoF (degrees of freedom)
**Error Conditions**:
- Returns `None`: Insufficient correspondences
**Test Cases**:
1. **Valid correspondence**: Returns 3×3 matrix
2. **Insufficient features**: Returns None
---
### `extract_gps_from_alignment(homography: np.ndarray, tile_bounds: TileBounds, image_center: Tuple[int, int]) -> GPSPoint`
**Description**: Extracts GPS coordinates from homography and tile georeferencing.
**Called By**:
- Internal (during align_to_satellite)
- F06 Image Rotation Manager (for precise angle calculation)
**Input**:
```python
homography: np.ndarray # 3×3 matrix
tile_bounds: TileBounds # GPS bounds of satellite tile
image_center: Tuple[int, int] # Center pixel of UAV image
```
**Output**:
```python
GPSPoint:
lat: float
lon: float
```
**Algorithm**:
1. Apply homography to UAV image center point
2. Get pixel coordinates in satellite tile
3. Convert satellite pixel to GPS using tile_bounds and GSD
4. Return GPS coordinates
**Uses**: tile_bounds parameter, H02 GSD Calculator
**Test Cases**:
1. **Center alignment**: UAV center → correct GPS
2. **Corner alignment**: UAV corner → correct GPS
3. **Multiple points**: All points consistent
---
### `compute_match_confidence(alignment: AlignmentResult) -> float`
**Description**: Computes match confidence score from alignment quality.
**Called By**:
- Internal (during align_to_satellite)
- F11 Failure Recovery Coordinator (to decide if match acceptable)
**Input**:
```python
alignment: AlignmentResult
```
**Output**:
```python
float: Confidence score (0.0 to 1.0)
```
**Confidence Factors**:
1. **Inlier ratio**: inliers / total_correspondences
2. **Inlier count**: Absolute number of inliers
3. **Reprojection error**: Mean error of inliers (in pixels)
4. **Spatial distribution**: Inliers well-distributed vs clustered
**Thresholds**:
- **High confidence (>0.8)**: inlier_ratio > 0.6, inlier_count > 50, MRE < 0.5px
- **Medium confidence (0.5-0.8)**: inlier_ratio > 0.4, inlier_count > 30
- **Low confidence (<0.5)**: Reject match
**Test Cases**:
1. **Good match**: confidence > 0.8
2. **Weak match**: confidence 0.5-0.7
3. **Poor match**: confidence < 0.5
---
### `align_chunk_to_satellite(chunk_images: List[np.ndarray], satellite_tile: np.ndarray, tile_bounds: TileBounds) -> Optional[ChunkAlignmentResult]`
**Description**: Aligns entire chunk to satellite tile, returning GPS location.
**Called By**:
- F06 Image Rotation Manager (during chunk rotation sweep)
- F11 Failure Recovery Coordinator (chunk LiteSAM matching)
**Input**:
```python
chunk_images: List[np.ndarray] # Pre-rotated chunk images (5-20 images)
satellite_tile: np.ndarray # Reference satellite tile
tile_bounds: TileBounds # GPS bounds and GSD of the satellite tile
```
**Output**:
```python
ChunkAlignmentResult:
matched: bool
chunk_id: str
chunk_center_gps: GPSPoint # GPS of chunk center (middle frame)
rotation_angle: float
confidence: float
inlier_count: int
transform: Sim3Transform
```
**Processing Flow**:
1. For each image in chunk:
- Extract features using LiteSAM encoder
- Compute correspondences with satellite tile
2. Aggregate correspondences from all images
3. Estimate homography from aggregate correspondences
4. Validate match quality (inlier count, reprojection error)
5. If valid match:
- Extract GPS from chunk center using tile_bounds
- Compute Sim(3) transform (translation, rotation, scale)
- Return ChunkAlignmentResult
6. If no match:
- Return None
**Match Criteria**:
- **Good match**: inlier_count > 50, confidence > 0.7
- **Weak match**: inlier_count 30-50, confidence 0.5-0.7
- **No match**: inlier_count < 30
**Advantages over Single-Image Matching**:
- More correspondences (aggregate from multiple images)
- More robust to featureless terrain
- Better handles partial occlusions
- Higher confidence scores
**Test Cases**:
1. **Chunk alignment**: Returns GPS within 20m of ground truth
2. **Featureless terrain**: Succeeds where single-image fails
3. **Rotation >45°**: Fails (requires pre-rotation via F06)
4. **Multi-scale**: Handles GSD mismatch
---
### `match_chunk_homography(chunk_images: List[np.ndarray], satellite_tile: np.ndarray) -> Optional[np.ndarray]`
**Description**: Computes homography transformation from chunk to satellite.
**Called By**:
- Internal (during align_chunk_to_satellite)
**Input**:
```python
chunk_images: List[np.ndarray]
satellite_tile: np.ndarray
```
**Output**:
```python
Optional[np.ndarray]: 3×3 homography matrix or None
```
**Algorithm (LiteSAM)**:
1. Extract multi-scale features from all chunk images using TAIFormer
2. Aggregate features (mean or max pooling)
3. Compute correlation via Convolutional Token Mixer (CTM)
4. Generate dense correspondences
5. Estimate homography using RANSAC
6. Refine with non-linear optimization
**Homography Properties**:
- Maps pixels from chunk center to satellite image
- Accounts for: scale, rotation, perspective
- 8 DoF (degrees of freedom)
**Test Cases**:
1. **Valid correspondence**: Returns 3×3 matrix
2. **Insufficient features**: Returns None
3. **Aggregate correspondences**: More robust than single-image
## Integration Tests
### Test 1: Single Tile Drift Correction
1. Load UAV image and expected satellite tile
2. Pre-rotate UAV image to known heading
3. align_to_satellite() → returns GPS
4. Verify GPS within 20m of ground truth
### Test 2: Progressive Search (4 tiles)
1. Load UAV image from sharp turn
2. Get 2×2 tile grid from F04
3. align_to_satellite() for each tile (with tile_bounds)
4. First 3 tiles: No match
5. 4th tile: Match found → GPS extracted
### Test 3: Rotation Sensitivity
1. Rotate UAV image by 60° (not pre-rotated)
2. align_to_satellite() → returns None (fails as expected)
3. Pre-rotate to 60°
4. align_to_satellite() → succeeds
### Test 4: Multi-Scale Robustness
1. UAV at 500m altitude (GSD=0.1m/pixel)
2. Satellite at zoom 19 (GSD=0.3m/pixel)
3. LiteSAM handles scale difference → match succeeds
### Test 5: Chunk LiteSAM Matching
1. Build chunk with 10 images (plain field scenario)
2. Pre-rotate chunk to known heading
3. align_chunk_to_satellite() → returns GPS
4. Verify GPS within 20m of ground truth
5. Verify chunk matching more robust than single-image
### Test 6: Chunk Rotation Sweeps
1. Build chunk with unknown orientation
2. Try chunk rotation steps (0°, 30°, ..., 330°)
3. align_chunk_to_satellite() for each rotation
4. Match found at 120° → GPS extracted
5. Verify Sim(3) transform computed correctly
## Non-Functional Requirements
### Performance
- **align_to_satellite**: ~60ms per tile (TensorRT optimized)
- **Progressive search 25 tiles**: ~1.5 seconds total (25 × 60ms)
- Meets <5s per frame requirement
### Accuracy
- **GPS accuracy**: 60% of frames < 20m error, 80% < 50m error
- **Mean Reprojection Error (MRE)**: < 1.0 pixels
- **Alignment success rate**: > 95% when rotation correct
### Reliability
- Graceful failure when no match
- Robust to altitude variations (<1km)
- Handles seasonal appearance changes (to extent possible)
## Dependencies
### Internal Components
- **F16 Model Manager**: For LiteSAM model
- **H01 Camera Model**: For projection operations
- **H02 GSD Calculator**: For coordinate transformations
- **H05 Performance Monitor**: For timing
- **F12 Route Chunk Manager**: For chunk image retrieval
**Note**: tile_bounds is passed as parameter from caller (F02 Flight Processor gets it from F04 Satellite Data Manager)
### External Dependencies
- **LiteSAM**: Cross-view matching model
- **opencv-python**: Homography estimation
- **numpy**: Matrix operations
## Data Models
### AlignmentResult
```python
class AlignmentResult(BaseModel):
matched: bool
homography: np.ndarray # (3, 3)
gps_center: GPSPoint
confidence: float
inlier_count: int
total_correspondences: int
reprojection_error: float # Mean error in pixels
```
### GPSPoint
```python
class GPSPoint(BaseModel):
lat: float
lon: float
```
### TileBounds
```python
class TileBounds(BaseModel):
nw: GPSPoint
ne: GPSPoint
sw: GPSPoint
se: GPSPoint
center: GPSPoint
gsd: float # Ground Sampling Distance (m/pixel)
```
### LiteSAMConfig
```python
class LiteSAMConfig(BaseModel):
model_path: str
confidence_threshold: float = 0.7
min_inliers: int = 15
max_reprojection_error: float = 2.0 # pixels
multi_scale_levels: int = 3
chunk_min_inliers: int = 30 # Higher threshold for chunk matching
```
### ChunkAlignmentResult
```python
class ChunkAlignmentResult(BaseModel):
matched: bool
chunk_id: str
chunk_center_gps: GPSPoint
rotation_angle: float
confidence: float
inlier_count: int
transform: Sim3Transform # Translation, rotation, scale
reprojection_error: float # Mean error in pixels
```
### Sim3Transform
```python
class Sim3Transform(BaseModel):
translation: np.ndarray # (3,) - translation vector
rotation: np.ndarray # (3, 3) rotation matrix or (4,) quaternion
scale: float # Scale factor
```