component decomposition is done

This commit is contained in:
Oleksandr Bezdieniezhnykh
2025-11-24 14:09:23 +02:00
parent acec83018b
commit f50006d100
34 changed files with 8637 additions and 0 deletions
@@ -0,0 +1,316 @@
# Sequential Visual Odometry
## Interface Definition
**Interface Name**: `ISequentialVO`
### Interface Methods
```python
class ISequentialVO(ABC):
@abstractmethod
def compute_relative_pose(self, prev_image: np.ndarray, curr_image: np.ndarray) -> Optional[RelativePose]:
pass
@abstractmethod
def extract_features(self, image: np.ndarray) -> Features:
pass
@abstractmethod
def match_features(self, features1: Features, features2: Features) -> Matches:
pass
@abstractmethod
def estimate_motion(self, matches: Matches, camera_params: CameraParameters) -> Optional[Motion]:
pass
```
## Component Description
### Responsibilities
- SuperPoint feature extraction from UAV images
- LightGlue feature matching between consecutive frames
- Handle <5% overlap scenarios
- Estimate relative pose (translation + rotation) between frames
- Return relative pose factors for Factor Graph Optimizer
- Detect tracking loss (low inlier count)
### Scope
- Frame-to-frame visual odometry
- Feature-based motion estimation
- Handles low overlap and challenging agricultural environments
- Provides relative measurements for trajectory optimization
## API Methods
### `compute_relative_pose(prev_image: np.ndarray, curr_image: np.ndarray) -> Optional[RelativePose]`
**Description**: Computes relative camera pose between consecutive frames.
**Called By**:
- Main processing loop (per-frame)
**Input**:
```python
prev_image: np.ndarray # Previous frame (t-1)
curr_image: np.ndarray # Current frame (t)
```
**Output**:
```python
RelativePose:
translation: np.ndarray # (x, y, z) in meters
rotation: np.ndarray # 3×3 rotation matrix or quaternion
confidence: float # 0.0 to 1.0
inlier_count: int
total_matches: int
tracking_good: bool
```
**Processing Flow**:
1. extract_features(prev_image) → features1
2. extract_features(curr_image) → features2
3. match_features(features1, features2) → matches
4. estimate_motion(matches, camera_params) → motion
5. Return RelativePose
**Tracking Quality Indicators**:
- **Good tracking**: inlier_count > 50, inlier_ratio > 0.5
- **Degraded tracking**: inlier_count 20-50
- **Tracking loss**: inlier_count < 20
**Error Conditions**:
- Returns `None`: Tracking lost (insufficient matches)
**Test Cases**:
1. **Good overlap (>50%)**: Returns reliable pose
2. **Low overlap (5-10%)**: Still succeeds with LightGlue
3. **<5% overlap**: May return None (tracking loss)
4. **Agricultural texture**: Handles repetitive patterns
---
### `extract_features(image: np.ndarray) -> Features`
**Description**: Extracts SuperPoint keypoints and descriptors from image.
**Called By**:
- Internal (during compute_relative_pose)
- G08 Global Place Recognition (for descriptor caching)
**Input**:
```python
image: np.ndarray # Input image (H×W×3 or H×W)
```
**Output**:
```python
Features:
keypoints: np.ndarray # (N, 2) - (x, y) coordinates
descriptors: np.ndarray # (N, 256) - 256-dim descriptors
scores: np.ndarray # (N,) - detection confidence scores
```
**Processing Details**:
- Uses G15 Model Manager to get SuperPoint model
- Converts to grayscale if needed
- Non-maximum suppression for keypoint selection
- Typically extracts 500-2000 keypoints per image
**Performance**:
- Inference time: ~15ms with TensorRT on RTX 2060
**Error Conditions**:
- Never fails (returns empty features if image invalid)
**Test Cases**:
1. **FullHD image**: Extracts ~1000 keypoints
2. **High-res image (6252×4168)**: Extracts ~2000 keypoints
3. **Low-texture image**: Extracts fewer keypoints
---
### `match_features(features1: Features, features2: Features) -> Matches`
**Description**: Matches features using LightGlue attention-based matcher.
**Called By**:
- Internal (during compute_relative_pose)
**Input**:
```python
features1: Features # Previous frame features
features2: Features # Current frame features
```
**Output**:
```python
Matches:
matches: np.ndarray # (M, 2) - indices [idx1, idx2]
scores: np.ndarray # (M,) - match confidence scores
keypoints1: np.ndarray # (M, 2) - matched keypoints from frame 1
keypoints2: np.ndarray # (M, 2) - matched keypoints from frame 2
```
**Processing Details**:
- Uses G15 Model Manager to get LightGlue model
- Transformer-based attention mechanism
- "Dustbin" mechanism for unmatched features
- Adaptive depth (exits early for easy matches)
- **Critical**: Handles <5% overlap better than RANSAC
**Performance**:
- Inference time: ~35-100ms (adaptive depth)
- Faster for high-overlap, slower for low-overlap
**Test Cases**:
1. **High overlap**: Fast matching (~35ms), 500+ matches
2. **Low overlap (<5%)**: Slower (~100ms), 20-50 matches
3. **No overlap**: Few or no matches (< 10)
---
### `estimate_motion(matches: Matches, camera_params: CameraParameters) -> Optional[Motion]`
**Description**: Estimates camera motion from matched keypoints using Essential Matrix.
**Called By**:
- Internal (during compute_relative_pose)
**Input**:
```python
matches: Matches
camera_params: CameraParameters:
focal_length: float
principal_point: Tuple[float, float]
resolution: Tuple[int, int]
```
**Output**:
```python
Motion:
translation: np.ndarray # (x, y, z) - unit vector (scale ambiguous)
rotation: np.ndarray # 3×3 rotation matrix
inliers: np.ndarray # Boolean mask of inlier matches
inlier_count: int
```
**Algorithm**:
1. Normalize keypoint coordinates using camera intrinsics
2. Estimate Essential Matrix using RANSAC
3. Decompose Essential Matrix → [R, t]
4. Return motion with inlier mask
**Scale Ambiguity**:
- Monocular VO has inherent scale ambiguity
- Translation is unit vector (direction only)
- Scale resolved by:
- Altitude prior (from G10 Factor Graph)
- Absolute GPS measurements (from G09 LiteSAM)
**Error Conditions**:
- Returns `None`: Insufficient inliers (< 8 points for Essential Matrix)
**Test Cases**:
1. **Good matches**: Returns motion with high inlier count
2. **Low inliers**: May return None
3. **Degenerate motion**: Handles pure rotation
## Integration Tests
### Test 1: Normal Flight Sequence
1. Load consecutive frames with 50% overlap
2. compute_relative_pose() → returns valid pose
3. Verify translation direction reasonable
4. Verify inlier_count > 100
### Test 2: Low Overlap Scenario
1. Load frames with 5% overlap
2. compute_relative_pose() → still succeeds
3. Verify inlier_count > 20
4. Verify LightGlue finds matches despite low overlap
### Test 3: Tracking Loss
1. Load frames with 0% overlap (sharp turn)
2. compute_relative_pose() → returns None
3. Verify tracking_good = False
4. Trigger global place recognition
### Test 4: Agricultural Texture
1. Load images of wheat fields (repetitive texture)
2. compute_relative_pose() → SuperPoint handles better than SIFT
3. Verify match quality
## Non-Functional Requirements
### Performance
- **compute_relative_pose**: < 200ms total
- SuperPoint extraction: ~15ms × 2 = 30ms
- LightGlue matching: ~50ms
- Motion estimation: ~10ms
- **Frame rate**: 5-10 FPS processing (meets <5s requirement)
### Accuracy
- **Relative rotation**: ±2° error
- **Relative translation direction**: ±5° error
- **Inlier ratio**: >50% for good tracking
### Reliability
- Handle 100m spacing between frames
- Survive temporary tracking degradation
- Recover from brief occlusions
## Dependencies
### Internal Components
- **G15 Model Manager**: For SuperPoint and LightGlue models
- **G16 Configuration Manager**: For camera parameters
- **H01 Camera Model**: For coordinate normalization
- **H05 Performance Monitor**: For timing measurements
### External Dependencies
- **SuperPoint**: Feature extraction model
- **LightGlue**: Feature matching model
- **opencv-python**: Essential Matrix estimation
- **numpy**: Matrix operations
## Data Models
### Features
```python
class Features(BaseModel):
keypoints: np.ndarray # (N, 2)
descriptors: np.ndarray # (N, 256)
scores: np.ndarray # (N,)
```
### Matches
```python
class Matches(BaseModel):
matches: np.ndarray # (M, 2) - pairs of indices
scores: np.ndarray # (M,) - match confidence
keypoints1: np.ndarray # (M, 2)
keypoints2: np.ndarray # (M, 2)
```
### RelativePose
```python
class RelativePose(BaseModel):
translation: np.ndarray # (3,) - unit vector
rotation: np.ndarray # (3, 3) or (4,) quaternion
confidence: float
inlier_count: int
total_matches: int
tracking_good: bool
scale_ambiguous: bool = True
```
### Motion
```python
class Motion(BaseModel):
translation: np.ndarray # (3,)
rotation: np.ndarray # (3, 3)
inliers: np.ndarray # Boolean mask
inlier_count: int
```