initial structure implemented

docs -> _docs
This commit is contained in:
Oleksandr Bezdieniezhnykh
2025-12-01 14:20:56 +02:00
parent 9134c5db06
commit abc26d5c20
360 changed files with 3881 additions and 101 deletions
@@ -0,0 +1,55 @@
# Feature: Single Image Alignment
## Description
Core UAV-to-satellite cross-view matching for individual frames using LiteSAM. Computes precise GPS coordinates by aligning a pre-rotated UAV image to a georeferenced satellite tile through homography estimation.
## Component APIs Implemented
- `align_to_satellite(uav_image, satellite_tile, tile_bounds) -> AlignmentResult`
- `compute_homography(uav_image, satellite_tile) -> Optional[np.ndarray]`
- `extract_gps_from_alignment(homography, tile_bounds, image_center) -> GPSPoint`
- `compute_match_confidence(alignment) -> float`
## External Tools and Services
- **LiteSAM**: Cross-view matching model (TAIFormer encoder, CTM correlation)
- **opencv-python**: RANSAC homography estimation, image operations
- **numpy**: Matrix operations, coordinate transformations
## Internal Methods
| Method | Purpose |
|--------|---------|
| `_extract_features(image)` | Extract multi-scale features using LiteSAM TAIFormer encoder |
| `_compute_correspondences(uav_features, sat_features)` | Compute dense correspondence field via CTM |
| `_estimate_homography_ransac(correspondences)` | Estimate 3×3 homography using RANSAC |
| `_refine_homography(homography, correspondences)` | Non-linear refinement of homography |
| `_validate_match(homography, inliers)` | Check inlier count/ratio thresholds |
| `_pixel_to_gps(pixel, tile_bounds)` | Convert satellite pixel coordinates to GPS |
| `_compute_inlier_ratio(inliers, total)` | Calculate inlier ratio for confidence |
| `_compute_spatial_distribution(inliers)` | Assess inlier spatial distribution quality |
| `_compute_reprojection_error(homography, correspondences)` | Calculate mean reprojection error |
## Unit Tests
1. **Feature extraction**: LiteSAM encoder produces valid feature tensors
2. **Correspondence computation**: CTM produces dense correspondence field
3. **Homography estimation**: RANSAC returns valid 3×3 matrix for good correspondences
4. **Homography estimation failure**: Returns None for insufficient correspondences (<15 inliers)
5. **GPS extraction accuracy**: Pixel-to-GPS conversion within expected tolerance
6. **Confidence high**: Returns >0.8 for inlier_ratio >0.6, inlier_count >50, MRE <0.5px
7. **Confidence medium**: Returns 0.5-0.8 for moderate match quality
8. **Confidence low**: Returns <0.5 for poor matches
9. **Reprojection error calculation**: Correctly computes mean pixel error
10. **Spatial distribution scoring**: Penalizes clustered inliers
## Integration Tests
1. **Single tile drift correction**: Load UAV image + satellite tile → align_to_satellite() returns GPS within 20m of ground truth
2. **Progressive search (4 tiles)**: align_to_satellite() on 2×2 grid, first 3 fail, 4th succeeds
3. **Rotation sensitivity**: Unrotated image (>45°) fails; pre-rotated image succeeds
4. **Multi-scale robustness**: Different GSD (UAV 0.1m/px, satellite 0.3m/px) → match succeeds
5. **Altitude variation**: UAV at various altitudes (<1km) → consistent GPS accuracy
6. **Performance benchmark**: align_to_satellite() completes in ~60ms (TensorRT)
@@ -0,0 +1,51 @@
# Feature: Chunk Alignment
## Description
Batch UAV-to-satellite matching that aggregates correspondences from multiple images in a chunk for more robust geo-localization. Handles scenarios where single-image matching fails (featureless terrain, partial occlusions). Returns Sim(3) transform for the entire chunk.
## Component APIs Implemented
- `align_chunk_to_satellite(chunk_images, satellite_tile, tile_bounds) -> ChunkAlignmentResult`
- `match_chunk_homography(chunk_images, satellite_tile) -> Optional[np.ndarray]`
## External Tools and Services
- **LiteSAM**: Cross-view matching model (TAIFormer encoder, CTM correlation)
- **opencv-python**: RANSAC homography estimation
- **numpy**: Matrix operations, feature aggregation
## Internal Methods
| Method | Purpose |
|--------|---------|
| `_extract_chunk_features(chunk_images)` | Extract features from all chunk images |
| `_aggregate_features(features_list)` | Combine features via mean/max pooling |
| `_aggregate_correspondences(correspondences_list)` | Merge correspondences from multiple images |
| `_estimate_chunk_homography(aggregated_correspondences)` | Estimate homography from aggregate data |
| `_compute_sim3_transform(homography, tile_bounds)` | Extract translation, rotation, scale |
| `_get_chunk_center_gps(homography, tile_bounds, chunk_images)` | GPS of middle frame center |
| `_validate_chunk_match(inliers, confidence)` | Check chunk-specific thresholds (>30 inliers) |
## Unit Tests
1. **Feature aggregation**: Mean pooling produces valid combined features
2. **Correspondence aggregation**: Merges correspondences from N images correctly
3. **Chunk homography estimation**: Returns valid 3×3 matrix for aggregate correspondences
4. **Chunk homography failure**: Returns None for insufficient aggregate correspondences
5. **Sim(3) extraction**: Correctly decomposes homography into translation, rotation, scale
6. **Chunk center GPS**: Returns GPS of middle frame's center pixel
7. **Chunk confidence high**: Returns >0.7 for >50 inliers
8. **Chunk confidence medium**: Returns 0.5-0.7 for 30-50 inliers
9. **Chunk validation**: Rejects matches with <30 inliers
## Integration Tests
1. **Chunk LiteSAM matching**: 10 images from plain field → align_chunk_to_satellite() returns GPS within 20m
2. **Chunk vs single-image robustness**: Featureless terrain where single-image fails, chunk succeeds
3. **Chunk rotation sweeps**: Unknown orientation → try rotations (0°, 30°, ..., 330°) → match at correct angle
4. **Sim(3) transform correctness**: Verify transform aligns chunk trajectory to satellite coordinates
5. **Multi-scale chunk matching**: GSD mismatch handled correctly
6. **Performance benchmark**: 10-image chunk alignment completes within acceptable time
7. **Partial occlusion handling**: Some images occluded → chunk still matches successfully
@@ -0,0 +1,462 @@
# Metric Refinement
## Interface Definition
**Interface Name**: `IMetricRefinement`
### Interface Methods
```python
class IMetricRefinement(ABC):
@abstractmethod
def align_to_satellite(self, uav_image: np.ndarray, satellite_tile: np.ndarray, tile_bounds: TileBounds) -> Optional[AlignmentResult]:
pass
@abstractmethod
def compute_homography(self, uav_image: np.ndarray, satellite_tile: np.ndarray) -> Optional[np.ndarray]:
pass
@abstractmethod
def extract_gps_from_alignment(self, homography: np.ndarray, tile_bounds: TileBounds, image_center: Tuple[int, int]) -> GPSPoint:
pass
@abstractmethod
def compute_match_confidence(self, alignment: AlignmentResult) -> float:
pass
@abstractmethod
def align_chunk_to_satellite(self, chunk_images: List[np.ndarray], satellite_tile: np.ndarray, tile_bounds: TileBounds) -> Optional[ChunkAlignmentResult]:
pass
@abstractmethod
def match_chunk_homography(self, chunk_images: List[np.ndarray], satellite_tile: np.ndarray) -> Optional[np.ndarray]:
pass
```
## Component Description
### Responsibilities
- LiteSAM for precise UAV-to-satellite cross-view matching
- **Requires pre-rotated images** from Image Rotation Manager
- Compute homography mapping UAV image to satellite tile
- Extract absolute GPS coordinates from alignment
- Process against single tile (drift correction) or tile grid (progressive search)
- Achieve <20m accuracy requirement
- **Chunk-to-satellite matching (more robust than single-image)**
- **Chunk homography computation**
### Scope
- Cross-view geo-localization (UAV↔satellite)
- Handles altitude variations (<1km)
- Multi-scale processing for different GSDs
- Domain gap (UAV downward vs satellite nadir view)
- **Critical**: Fails if rotation >45° (handled by F06)
- **Chunk-level matching (aggregate correspondences from multiple images)**
## API Methods
### `align_to_satellite(uav_image: np.ndarray, satellite_tile: np.ndarray, tile_bounds: TileBounds) -> Optional[AlignmentResult]`
**Description**: Aligns UAV image to satellite tile, returning GPS location.
**Called By**:
- F06 Image Rotation Manager (during rotation sweep)
- F11 Failure Recovery Coordinator (progressive search)
- F02.2 Flight Processing Engine (drift correction with single tile)
**Input**:
```python
uav_image: np.ndarray # Pre-rotated UAV image
satellite_tile: np.ndarray # Reference satellite tile
tile_bounds: TileBounds # GPS bounds and GSD of the satellite tile
```
**Output**:
```python
AlignmentResult:
matched: bool
homography: np.ndarray # 3×3 transformation matrix
gps_center: GPSPoint # UAV image center GPS
confidence: float
inlier_count: int
total_correspondences: int
```
**Processing Flow**:
1. Extract features from both images using LiteSAM encoder
2. Compute dense correspondence field
3. Estimate homography from correspondences
4. Validate match quality (inlier count, reprojection error)
5. If valid match:
- Extract GPS from homography using tile_bounds
- Return AlignmentResult
6. If no match:
- Return None
**Match Criteria**:
- **Good match**: inlier_count > 30, confidence > 0.7
- **Weak match**: inlier_count 15-30, confidence 0.5-0.7
- **No match**: inlier_count < 15
**Error Conditions**:
- Returns `None`: No match found, rotation >45° (should be pre-rotated)
**Test Cases**:
1. **Good alignment**: Returns GPS within 20m of ground truth
2. **Altitude variation**: Handles GSD mismatch
3. **Rotation >45°**: Fails (by design, requires pre-rotation)
4. **Multi-scale**: Processes at multiple scales
---
### `compute_homography(uav_image: np.ndarray, satellite_tile: np.ndarray) -> Optional[np.ndarray]`
**Description**: Computes homography transformation from UAV to satellite.
**Called By**:
- Internal (during align_to_satellite)
**Input**:
```python
uav_image: np.ndarray
satellite_tile: np.ndarray
```
**Output**:
```python
Optional[np.ndarray]: 3×3 homography matrix or None
```
**Algorithm (LiteSAM)**:
1. Extract multi-scale features using TAIFormer
2. Compute correlation via Convolutional Token Mixer (CTM)
3. Generate dense correspondences
4. Estimate homography using RANSAC
5. Refine with non-linear optimization
**Homography Properties**:
- Maps pixels from UAV image to satellite image
- Accounts for: scale, rotation, perspective
- 8 DoF (degrees of freedom)
**Error Conditions**:
- Returns `None`: Insufficient correspondences
**Test Cases**:
1. **Valid correspondence**: Returns 3×3 matrix
2. **Insufficient features**: Returns None
---
### `extract_gps_from_alignment(homography: np.ndarray, tile_bounds: TileBounds, image_center: Tuple[int, int]) -> GPSPoint`
**Description**: Extracts GPS coordinates from homography and tile georeferencing.
**Called By**:
- Internal (during align_to_satellite)
- F06 Image Rotation Manager (for precise angle calculation)
**Input**:
```python
homography: np.ndarray # 3×3 matrix
tile_bounds: TileBounds # GPS bounds of satellite tile
image_center: Tuple[int, int] # Center pixel of UAV image
```
**Output**:
```python
GPSPoint:
lat: float
lon: float
```
**Algorithm**:
1. Apply homography to UAV image center point
2. Get pixel coordinates in satellite tile
3. Convert satellite pixel to GPS using tile_bounds and GSD
4. Return GPS coordinates
**Uses**: tile_bounds parameter, H02 GSD Calculator
**Test Cases**:
1. **Center alignment**: UAV center → correct GPS
2. **Corner alignment**: UAV corner → correct GPS
3. **Multiple points**: All points consistent
---
### `compute_match_confidence(alignment: AlignmentResult) -> float`
**Description**: Computes match confidence score from alignment quality.
**Called By**:
- Internal (during align_to_satellite)
- F11 Failure Recovery Coordinator (to decide if match acceptable)
**Input**:
```python
alignment: AlignmentResult
```
**Output**:
```python
float: Confidence score (0.0 to 1.0)
```
**Confidence Factors**:
1. **Inlier ratio**: inliers / total_correspondences
2. **Inlier count**: Absolute number of inliers
3. **Reprojection error**: Mean error of inliers (in pixels)
4. **Spatial distribution**: Inliers well-distributed vs clustered
**Thresholds**:
- **High confidence (>0.8)**: inlier_ratio > 0.6, inlier_count > 50, MRE < 0.5px
- **Medium confidence (0.5-0.8)**: inlier_ratio > 0.4, inlier_count > 30
- **Low confidence (<0.5)**: Reject match
**Test Cases**:
1. **Good match**: confidence > 0.8
2. **Weak match**: confidence 0.5-0.7
3. **Poor match**: confidence < 0.5
---
### `align_chunk_to_satellite(chunk_images: List[np.ndarray], satellite_tile: np.ndarray, tile_bounds: TileBounds) -> Optional[ChunkAlignmentResult]`
**Description**: Aligns entire chunk to satellite tile, returning GPS location.
**Called By**:
- F06 Image Rotation Manager (during chunk rotation sweep)
- F11 Failure Recovery Coordinator (chunk LiteSAM matching)
**Input**:
```python
chunk_images: List[np.ndarray] # Pre-rotated chunk images (5-20 images)
satellite_tile: np.ndarray # Reference satellite tile
tile_bounds: TileBounds # GPS bounds and GSD of the satellite tile
```
**Output**:
```python
ChunkAlignmentResult:
matched: bool
chunk_id: str
chunk_center_gps: GPSPoint # GPS of chunk center (middle frame)
rotation_angle: float
confidence: float
inlier_count: int
transform: Sim3Transform
```
**Processing Flow**:
1. For each image in chunk:
- Extract features using LiteSAM encoder
- Compute correspondences with satellite tile
2. Aggregate correspondences from all images
3. Estimate homography from aggregate correspondences
4. Validate match quality (inlier count, reprojection error)
5. If valid match:
- Extract GPS from chunk center using tile_bounds
- Compute Sim(3) transform (translation, rotation, scale)
- Return ChunkAlignmentResult
6. If no match:
- Return None
**Match Criteria**:
- **Good match**: inlier_count > 50, confidence > 0.7
- **Weak match**: inlier_count 30-50, confidence 0.5-0.7
- **No match**: inlier_count < 30
**Advantages over Single-Image Matching**:
- More correspondences (aggregate from multiple images)
- More robust to featureless terrain
- Better handles partial occlusions
- Higher confidence scores
**Test Cases**:
1. **Chunk alignment**: Returns GPS within 20m of ground truth
2. **Featureless terrain**: Succeeds where single-image fails
3. **Rotation >45°**: Fails (requires pre-rotation via F06)
4. **Multi-scale**: Handles GSD mismatch
---
### `match_chunk_homography(chunk_images: List[np.ndarray], satellite_tile: np.ndarray) -> Optional[np.ndarray]`
**Description**: Computes homography transformation from chunk to satellite.
**Called By**:
- Internal (during align_chunk_to_satellite)
**Input**:
```python
chunk_images: List[np.ndarray]
satellite_tile: np.ndarray
```
**Output**:
```python
Optional[np.ndarray]: 3×3 homography matrix or None
```
**Algorithm (LiteSAM)**:
1. Extract multi-scale features from all chunk images using TAIFormer
2. Aggregate features (mean or max pooling)
3. Compute correlation via Convolutional Token Mixer (CTM)
4. Generate dense correspondences
5. Estimate homography using RANSAC
6. Refine with non-linear optimization
**Homography Properties**:
- Maps pixels from chunk center to satellite image
- Accounts for: scale, rotation, perspective
- 8 DoF (degrees of freedom)
**Test Cases**:
1. **Valid correspondence**: Returns 3×3 matrix
2. **Insufficient features**: Returns None
3. **Aggregate correspondences**: More robust than single-image
## Integration Tests
### Test 1: Single Tile Drift Correction
1. Load UAV image and expected satellite tile
2. Pre-rotate UAV image to known heading
3. align_to_satellite() → returns GPS
4. Verify GPS within 20m of ground truth
### Test 2: Progressive Search (4 tiles)
1. Load UAV image from sharp turn
2. Get 2×2 tile grid from F04
3. align_to_satellite() for each tile (with tile_bounds)
4. First 3 tiles: No match
5. 4th tile: Match found → GPS extracted
### Test 3: Rotation Sensitivity
1. Rotate UAV image by 60° (not pre-rotated)
2. align_to_satellite() → returns None (fails as expected)
3. Pre-rotate to 60°
4. align_to_satellite() → succeeds
### Test 4: Multi-Scale Robustness
1. UAV at 500m altitude (GSD=0.1m/pixel)
2. Satellite at zoom 19 (GSD=0.3m/pixel)
3. LiteSAM handles scale difference → match succeeds
### Test 5: Chunk LiteSAM Matching
1. Build chunk with 10 images (plain field scenario)
2. Pre-rotate chunk to known heading
3. align_chunk_to_satellite() → returns GPS
4. Verify GPS within 20m of ground truth
5. Verify chunk matching more robust than single-image
### Test 6: Chunk Rotation Sweeps
1. Build chunk with unknown orientation
2. Try chunk rotation steps (0°, 30°, ..., 330°)
3. align_chunk_to_satellite() for each rotation
4. Match found at 120° → GPS extracted
5. Verify Sim(3) transform computed correctly
## Non-Functional Requirements
### Performance
- **align_to_satellite**: ~60ms per tile (TensorRT optimized)
- **Progressive search 25 tiles**: ~1.5 seconds total (25 × 60ms)
- Meets <5s per frame requirement
### Accuracy
- **GPS accuracy**: 60% of frames < 20m error, 80% < 50m error
- **Mean Reprojection Error (MRE)**: < 1.0 pixels
- **Alignment success rate**: > 95% when rotation correct
### Reliability
- Graceful failure when no match
- Robust to altitude variations (<1km)
- Handles seasonal appearance changes (to extent possible)
## Dependencies
### Internal Components
- **F12 Route Chunk Manager**: For chunk image retrieval and chunk operations
- **F16 Model Manager**: For LiteSAM model
- **H01 Camera Model**: For projection operations
- **H02 GSD Calculator**: For coordinate transformations
- **H05 Performance Monitor**: For timing
**Critical Dependency on F06 Image Rotation Manager**:
- F09 requires pre-rotated images (rotation <45° from north)
- Caller (F06 or F11) must pre-rotate images using F06.rotate_image_360() before calling F09.align_to_satellite()
- If rotation >45°, F09 will fail to match (by design)
- F06 handles the rotation sweep (trying 0°, 30°, 60°, etc.) and calls F09 for each rotation
**Note**: tile_bounds is passed as parameter from caller (F02.2 Flight Processing Engine gets it from F04 Satellite Data Manager)
### External Dependencies
- **LiteSAM**: Cross-view matching model
- **opencv-python**: Homography estimation
- **numpy**: Matrix operations
## Data Models
### AlignmentResult
```python
class AlignmentResult(BaseModel):
matched: bool
homography: np.ndarray # (3, 3)
gps_center: GPSPoint
confidence: float
inlier_count: int
total_correspondences: int
reprojection_error: float # Mean error in pixels
```
### GPSPoint
```python
class GPSPoint(BaseModel):
lat: float
lon: float
```
### TileBounds
```python
class TileBounds(BaseModel):
nw: GPSPoint
ne: GPSPoint
sw: GPSPoint
se: GPSPoint
center: GPSPoint
gsd: float # Ground Sampling Distance (m/pixel)
```
### LiteSAMConfig
```python
class LiteSAMConfig(BaseModel):
model_path: str
confidence_threshold: float = 0.7
min_inliers: int = 15
max_reprojection_error: float = 2.0 # pixels
multi_scale_levels: int = 3
chunk_min_inliers: int = 30 # Higher threshold for chunk matching
```
### ChunkAlignmentResult
```python
class ChunkAlignmentResult(BaseModel):
matched: bool
chunk_id: str
chunk_center_gps: GPSPoint
rotation_angle: float
confidence: float
inlier_count: int
transform: Sim3Transform # Translation, rotation, scale
reprojection_error: float # Mean error in pixels
```
### Sim3Transform
```python
class Sim3Transform(BaseModel):
translation: np.ndarray # (3,) - translation vector
rotation: np.ndarray # (3, 3) rotation matrix or (4,) quaternion
scale: float # Scale factor
```