Files
gps-denied-onboard/docs/02_components/09_metric_refinement/metric_refinement_spec.md
T
2025-11-30 16:09:31 +02:00

14 KiB
Raw Blame History

Metric Refinement

Interface Definition

Interface Name: IMetricRefinement

Interface Methods

class IMetricRefinement(ABC):
    @abstractmethod
    def align_to_satellite(self, uav_image: np.ndarray, satellite_tile: np.ndarray, tile_bounds: TileBounds) -> Optional[AlignmentResult]:
        pass
    
    @abstractmethod
    def compute_homography(self, uav_image: np.ndarray, satellite_tile: np.ndarray) -> Optional[np.ndarray]:
        pass
    
    @abstractmethod
    def extract_gps_from_alignment(self, homography: np.ndarray, tile_bounds: TileBounds, image_center: Tuple[int, int]) -> GPSPoint:
        pass
    
    @abstractmethod
    def compute_match_confidence(self, alignment: AlignmentResult) -> float:
        pass
    
    @abstractmethod
    def align_chunk_to_satellite(self, chunk_images: List[np.ndarray], satellite_tile: np.ndarray, tile_bounds: TileBounds) -> Optional[ChunkAlignmentResult]:
        pass
    
    @abstractmethod
    def match_chunk_homography(self, chunk_images: List[np.ndarray], satellite_tile: np.ndarray) -> Optional[np.ndarray]:
        pass

Component Description

Responsibilities

  • LiteSAM for precise UAV-to-satellite cross-view matching
  • Requires pre-rotated images from Image Rotation Manager
  • Compute homography mapping UAV image to satellite tile
  • Extract absolute GPS coordinates from alignment
  • Process against single tile (drift correction) or tile grid (progressive search)
  • Achieve <20m accuracy requirement
  • Chunk-to-satellite matching (more robust than single-image)
  • Chunk homography computation

Scope

  • Cross-view geo-localization (UAV↔satellite)
  • Handles altitude variations (<1km)
  • Multi-scale processing for different GSDs
  • Domain gap (UAV downward vs satellite nadir view)
  • Critical: Fails if rotation >45° (handled by F06)
  • Chunk-level matching (aggregate correspondences from multiple images)

API Methods

align_to_satellite(uav_image: np.ndarray, satellite_tile: np.ndarray, tile_bounds: TileBounds) -> Optional[AlignmentResult]

Description: Aligns UAV image to satellite tile, returning GPS location.

Called By:

  • F06 Image Rotation Manager (during rotation sweep)
  • F11 Failure Recovery Coordinator (progressive search)
  • F02.2 Flight Processing Engine (drift correction with single tile)

Input:

uav_image: np.ndarray  # Pre-rotated UAV image
satellite_tile: np.ndarray  # Reference satellite tile
tile_bounds: TileBounds  # GPS bounds and GSD of the satellite tile

Output:

AlignmentResult:
    matched: bool
    homography: np.ndarray  # 3×3 transformation matrix
    gps_center: GPSPoint  # UAV image center GPS
    confidence: float
    inlier_count: int
    total_correspondences: int

Processing Flow:

  1. Extract features from both images using LiteSAM encoder
  2. Compute dense correspondence field
  3. Estimate homography from correspondences
  4. Validate match quality (inlier count, reprojection error)
  5. If valid match:
    • Extract GPS from homography using tile_bounds
    • Return AlignmentResult
  6. If no match:
    • Return None

Match Criteria:

  • Good match: inlier_count > 30, confidence > 0.7
  • Weak match: inlier_count 15-30, confidence 0.5-0.7
  • No match: inlier_count < 15

Error Conditions:

  • Returns None: No match found, rotation >45° (should be pre-rotated)

Test Cases:

  1. Good alignment: Returns GPS within 20m of ground truth
  2. Altitude variation: Handles GSD mismatch
  3. Rotation >45°: Fails (by design, requires pre-rotation)
  4. Multi-scale: Processes at multiple scales

compute_homography(uav_image: np.ndarray, satellite_tile: np.ndarray) -> Optional[np.ndarray]

Description: Computes homography transformation from UAV to satellite.

Called By:

  • Internal (during align_to_satellite)

Input:

uav_image: np.ndarray
satellite_tile: np.ndarray

Output:

Optional[np.ndarray]: 3×3 homography matrix or None

Algorithm (LiteSAM):

  1. Extract multi-scale features using TAIFormer
  2. Compute correlation via Convolutional Token Mixer (CTM)
  3. Generate dense correspondences
  4. Estimate homography using RANSAC
  5. Refine with non-linear optimization

Homography Properties:

  • Maps pixels from UAV image to satellite image
  • Accounts for: scale, rotation, perspective
  • 8 DoF (degrees of freedom)

Error Conditions:

  • Returns None: Insufficient correspondences

Test Cases:

  1. Valid correspondence: Returns 3×3 matrix
  2. Insufficient features: Returns None

extract_gps_from_alignment(homography: np.ndarray, tile_bounds: TileBounds, image_center: Tuple[int, int]) -> GPSPoint

Description: Extracts GPS coordinates from homography and tile georeferencing.

Called By:

  • Internal (during align_to_satellite)
  • F06 Image Rotation Manager (for precise angle calculation)

Input:

homography: np.ndarray  # 3×3 matrix
tile_bounds: TileBounds  # GPS bounds of satellite tile
image_center: Tuple[int, int]  # Center pixel of UAV image

Output:

GPSPoint:
    lat: float
    lon: float

Algorithm:

  1. Apply homography to UAV image center point
  2. Get pixel coordinates in satellite tile
  3. Convert satellite pixel to GPS using tile_bounds and GSD
  4. Return GPS coordinates

Uses: tile_bounds parameter, H02 GSD Calculator

Test Cases:

  1. Center alignment: UAV center → correct GPS
  2. Corner alignment: UAV corner → correct GPS
  3. Multiple points: All points consistent

compute_match_confidence(alignment: AlignmentResult) -> float

Description: Computes match confidence score from alignment quality.

Called By:

  • Internal (during align_to_satellite)
  • F11 Failure Recovery Coordinator (to decide if match acceptable)

Input:

alignment: AlignmentResult

Output:

float: Confidence score (0.0 to 1.0)

Confidence Factors:

  1. Inlier ratio: inliers / total_correspondences
  2. Inlier count: Absolute number of inliers
  3. Reprojection error: Mean error of inliers (in pixels)
  4. Spatial distribution: Inliers well-distributed vs clustered

Thresholds:

  • High confidence (>0.8): inlier_ratio > 0.6, inlier_count > 50, MRE < 0.5px
  • Medium confidence (0.5-0.8): inlier_ratio > 0.4, inlier_count > 30
  • Low confidence (<0.5): Reject match

Test Cases:

  1. Good match: confidence > 0.8
  2. Weak match: confidence 0.5-0.7
  3. Poor match: confidence < 0.5

align_chunk_to_satellite(chunk_images: List[np.ndarray], satellite_tile: np.ndarray, tile_bounds: TileBounds) -> Optional[ChunkAlignmentResult]

Description: Aligns entire chunk to satellite tile, returning GPS location.

Called By:

  • F06 Image Rotation Manager (during chunk rotation sweep)
  • F11 Failure Recovery Coordinator (chunk LiteSAM matching)

Input:

chunk_images: List[np.ndarray]  # Pre-rotated chunk images (5-20 images)
satellite_tile: np.ndarray  # Reference satellite tile
tile_bounds: TileBounds  # GPS bounds and GSD of the satellite tile

Output:

ChunkAlignmentResult:
    matched: bool
    chunk_id: str
    chunk_center_gps: GPSPoint  # GPS of chunk center (middle frame)
    rotation_angle: float
    confidence: float
    inlier_count: int
    transform: Sim3Transform

Processing Flow:

  1. For each image in chunk:
    • Extract features using LiteSAM encoder
    • Compute correspondences with satellite tile
  2. Aggregate correspondences from all images
  3. Estimate homography from aggregate correspondences
  4. Validate match quality (inlier count, reprojection error)
  5. If valid match:
    • Extract GPS from chunk center using tile_bounds
    • Compute Sim(3) transform (translation, rotation, scale)
    • Return ChunkAlignmentResult
  6. If no match:
    • Return None

Match Criteria:

  • Good match: inlier_count > 50, confidence > 0.7
  • Weak match: inlier_count 30-50, confidence 0.5-0.7
  • No match: inlier_count < 30

Advantages over Single-Image Matching:

  • More correspondences (aggregate from multiple images)
  • More robust to featureless terrain
  • Better handles partial occlusions
  • Higher confidence scores

Test Cases:

  1. Chunk alignment: Returns GPS within 20m of ground truth
  2. Featureless terrain: Succeeds where single-image fails
  3. Rotation >45°: Fails (requires pre-rotation via F06)
  4. Multi-scale: Handles GSD mismatch

match_chunk_homography(chunk_images: List[np.ndarray], satellite_tile: np.ndarray) -> Optional[np.ndarray]

Description: Computes homography transformation from chunk to satellite.

Called By:

  • Internal (during align_chunk_to_satellite)

Input:

chunk_images: List[np.ndarray]
satellite_tile: np.ndarray

Output:

Optional[np.ndarray]: 3×3 homography matrix or None

Algorithm (LiteSAM):

  1. Extract multi-scale features from all chunk images using TAIFormer
  2. Aggregate features (mean or max pooling)
  3. Compute correlation via Convolutional Token Mixer (CTM)
  4. Generate dense correspondences
  5. Estimate homography using RANSAC
  6. Refine with non-linear optimization

Homography Properties:

  • Maps pixels from chunk center to satellite image
  • Accounts for: scale, rotation, perspective
  • 8 DoF (degrees of freedom)

Test Cases:

  1. Valid correspondence: Returns 3×3 matrix
  2. Insufficient features: Returns None
  3. Aggregate correspondences: More robust than single-image

Integration Tests

Test 1: Single Tile Drift Correction

  1. Load UAV image and expected satellite tile
  2. Pre-rotate UAV image to known heading
  3. align_to_satellite() → returns GPS
  4. Verify GPS within 20m of ground truth

Test 2: Progressive Search (4 tiles)

  1. Load UAV image from sharp turn
  2. Get 2×2 tile grid from F04
  3. align_to_satellite() for each tile (with tile_bounds)
  4. First 3 tiles: No match
  5. 4th tile: Match found → GPS extracted

Test 3: Rotation Sensitivity

  1. Rotate UAV image by 60° (not pre-rotated)
  2. align_to_satellite() → returns None (fails as expected)
  3. Pre-rotate to 60°
  4. align_to_satellite() → succeeds

Test 4: Multi-Scale Robustness

  1. UAV at 500m altitude (GSD=0.1m/pixel)
  2. Satellite at zoom 19 (GSD=0.3m/pixel)
  3. LiteSAM handles scale difference → match succeeds

Test 5: Chunk LiteSAM Matching

  1. Build chunk with 10 images (plain field scenario)
  2. Pre-rotate chunk to known heading
  3. align_chunk_to_satellite() → returns GPS
  4. Verify GPS within 20m of ground truth
  5. Verify chunk matching more robust than single-image

Test 6: Chunk Rotation Sweeps

  1. Build chunk with unknown orientation
  2. Try chunk rotation steps (0°, 30°, ..., 330°)
  3. align_chunk_to_satellite() for each rotation
  4. Match found at 120° → GPS extracted
  5. Verify Sim(3) transform computed correctly

Non-Functional Requirements

Performance

  • align_to_satellite: ~60ms per tile (TensorRT optimized)
  • Progressive search 25 tiles: ~1.5 seconds total (25 × 60ms)
  • Meets <5s per frame requirement

Accuracy

  • GPS accuracy: 60% of frames < 20m error, 80% < 50m error
  • Mean Reprojection Error (MRE): < 1.0 pixels
  • Alignment success rate: > 95% when rotation correct

Reliability

  • Graceful failure when no match
  • Robust to altitude variations (<1km)
  • Handles seasonal appearance changes (to extent possible)

Dependencies

Internal Components

  • F12 Route Chunk Manager: For chunk image retrieval and chunk operations
  • F16 Model Manager: For LiteSAM model
  • H01 Camera Model: For projection operations
  • H02 GSD Calculator: For coordinate transformations
  • H05 Performance Monitor: For timing

Critical Dependency on F06 Image Rotation Manager:

  • F09 requires pre-rotated images (rotation <45° from north)
  • Caller (F06 or F11) must pre-rotate images using F06.rotate_image_360() before calling F09.align_to_satellite()
  • If rotation >45°, F09 will fail to match (by design)
  • F06 handles the rotation sweep (trying 0°, 30°, 60°, etc.) and calls F09 for each rotation

Note: tile_bounds is passed as parameter from caller (F02.2 Flight Processing Engine gets it from F04 Satellite Data Manager)

External Dependencies

  • LiteSAM: Cross-view matching model
  • opencv-python: Homography estimation
  • numpy: Matrix operations

Data Models

AlignmentResult

class AlignmentResult(BaseModel):
    matched: bool
    homography: np.ndarray  # (3, 3)
    gps_center: GPSPoint
    confidence: float
    inlier_count: int
    total_correspondences: int
    reprojection_error: float  # Mean error in pixels

GPSPoint

class GPSPoint(BaseModel):
    lat: float
    lon: float

TileBounds

class TileBounds(BaseModel):
    nw: GPSPoint
    ne: GPSPoint
    sw: GPSPoint
    se: GPSPoint
    center: GPSPoint
    gsd: float  # Ground Sampling Distance (m/pixel)

LiteSAMConfig

class LiteSAMConfig(BaseModel):
    model_path: str
    confidence_threshold: float = 0.7
    min_inliers: int = 15
    max_reprojection_error: float = 2.0  # pixels
    multi_scale_levels: int = 3
    chunk_min_inliers: int = 30  # Higher threshold for chunk matching

ChunkAlignmentResult

class ChunkAlignmentResult(BaseModel):
    matched: bool
    chunk_id: str
    chunk_center_gps: GPSPoint
    rotation_angle: float
    confidence: float
    inlier_count: int
    transform: Sim3Transform  # Translation, rotation, scale
    reprojection_error: float  # Mean error in pixels

Sim3Transform

class Sim3Transform(BaseModel):
    translation: np.ndarray  # (3,) - translation vector
    rotation: np.ndarray  # (3, 3) rotation matrix or (4,) quaternion
    scale: float  # Scale factor