add chunking

This commit is contained in:
Oleksandr Bezdieniezhnykh
2025-11-27 03:43:19 +02:00
parent 4f8c18a066
commit 2037870f67
43 changed files with 7041 additions and 4135 deletions
@@ -0,0 +1,376 @@
# Sequential Visual Odometry
## Interface Definition
**Interface Name**: `ISequentialVO`
### Interface Methods
```python
class ISequentialVO(ABC):
@abstractmethod
def compute_relative_pose(self, prev_image: np.ndarray, curr_image: np.ndarray) -> Optional[RelativePose]:
pass
@abstractmethod
def extract_features(self, image: np.ndarray) -> Features:
pass
@abstractmethod
def match_features(self, features1: Features, features2: Features) -> Matches:
pass
@abstractmethod
def estimate_motion(self, matches: Matches, camera_params: CameraParameters) -> Optional[Motion]:
pass
@abstractmethod
def compute_relative_pose_in_chunk(self, prev_image: np.ndarray, curr_image: np.ndarray, chunk_id: str) -> Optional[RelativePose]:
pass
```
## Component Description
### Responsibilities
- SuperPoint feature extraction from UAV images
- LightGlue feature matching between consecutive frames
- Handle <5% overlap scenarios
- Estimate relative pose (translation + rotation) between frames
- Return relative pose factors for Factor Graph Optimizer
- Detect tracking loss (low inlier count)
- **Chunk-aware VO operations (factors added to chunk subgraph)**
### Scope
- Frame-to-frame visual odometry
- Feature-based motion estimation
- Handles low overlap and challenging agricultural environments
- Provides relative measurements for trajectory optimization
- **Chunk-scoped operations (Atlas multi-map architecture)**
## API Methods
### `compute_relative_pose(prev_image: np.ndarray, curr_image: np.ndarray) -> Optional[RelativePose]`
**Description**: Computes relative camera pose between consecutive frames.
**Called By**:
- Main processing loop (per-frame)
**Input**:
```python
prev_image: np.ndarray # Previous frame (t-1)
curr_image: np.ndarray # Current frame (t)
```
**Output**:
```python
RelativePose:
translation: np.ndarray # (x, y, z) in meters
rotation: np.ndarray # 3×3 rotation matrix or quaternion
confidence: float # 0.0 to 1.0
inlier_count: int
total_matches: int
tracking_good: bool
```
**Processing Flow**:
1. extract_features(prev_image) → features1
2. extract_features(curr_image) → features2
3. match_features(features1, features2) → matches
4. estimate_motion(matches, camera_params) → motion
5. Return RelativePose
**Tracking Quality Indicators**:
- **Good tracking**: inlier_count > 50, inlier_ratio > 0.5
- **Degraded tracking**: inlier_count 20-50
- **Tracking loss**: inlier_count < 20
**Error Conditions**:
- Returns `None`: Tracking lost (insufficient matches)
**Test Cases**:
1. **Good overlap (>50%)**: Returns reliable pose
2. **Low overlap (5-10%)**: Still succeeds with LightGlue
3. **<5% overlap**: May return None (tracking loss)
4. **Agricultural texture**: Handles repetitive patterns
---
### `extract_features(image: np.ndarray) -> Features`
**Description**: Extracts SuperPoint keypoints and descriptors from image.
**Called By**:
- Internal (during compute_relative_pose)
- F08 Global Place Recognition (for descriptor caching)
**Input**:
```python
image: np.ndarray # Input image (H×W×3 or H×W)
```
**Output**:
```python
Features:
keypoints: np.ndarray # (N, 2) - (x, y) coordinates
descriptors: np.ndarray # (N, 256) - 256-dim descriptors
scores: np.ndarray # (N,) - detection confidence scores
```
**Processing Details**:
- Uses F16 Model Manager to get SuperPoint model
- Converts to grayscale if needed
- Non-maximum suppression for keypoint selection
- Typically extracts 500-2000 keypoints per image
**Performance**:
- Inference time: ~15ms with TensorRT on RTX 2060
**Error Conditions**:
- Never fails (returns empty features if image invalid)
**Test Cases**:
1. **FullHD image**: Extracts ~1000 keypoints
2. **High-res image (6252×4168)**: Extracts ~2000 keypoints
3. **Low-texture image**: Extracts fewer keypoints
---
### `match_features(features1: Features, features2: Features) -> Matches`
**Description**: Matches features using LightGlue attention-based matcher.
**Called By**:
- Internal (during compute_relative_pose)
**Input**:
```python
features1: Features # Previous frame features
features2: Features # Current frame features
```
**Output**:
```python
Matches:
matches: np.ndarray # (M, 2) - indices [idx1, idx2]
scores: np.ndarray # (M,) - match confidence scores
keypoints1: np.ndarray # (M, 2) - matched keypoints from frame 1
keypoints2: np.ndarray # (M, 2) - matched keypoints from frame 2
```
**Processing Details**:
- Uses F16 Model Manager to get LightGlue model
- Transformer-based attention mechanism
- "Dustbin" mechanism for unmatched features
- Adaptive depth (exits early for easy matches)
- **Critical**: Handles <5% overlap better than RANSAC
**Performance**:
- Inference time: ~35-100ms (adaptive depth)
- Faster for high-overlap, slower for low-overlap
**Test Cases**:
1. **High overlap**: Fast matching (~35ms), 500+ matches
2. **Low overlap (<5%)**: Slower (~100ms), 20-50 matches
3. **No overlap**: Few or no matches (< 10)
---
### `estimate_motion(matches: Matches, camera_params: CameraParameters) -> Optional[Motion]`
**Description**: Estimates camera motion from matched keypoints using Essential Matrix.
**Called By**:
- Internal (during compute_relative_pose)
**Input**:
```python
matches: Matches
camera_params: CameraParameters:
focal_length: float
principal_point: Tuple[float, float]
resolution: Tuple[int, int]
```
**Output**:
```python
Motion:
translation: np.ndarray # (x, y, z) - unit vector (scale ambiguous)
rotation: np.ndarray # 3×3 rotation matrix
inliers: np.ndarray # Boolean mask of inlier matches
inlier_count: int
```
**Algorithm**:
1. Normalize keypoint coordinates using camera intrinsics
2. Estimate Essential Matrix using RANSAC
3. Decompose Essential Matrix → [R, t]
4. Return motion with inlier mask
**Scale Ambiguity**:
- Monocular VO has inherent scale ambiguity
- Translation is unit vector (direction only, magnitude = 1)
- **F07 does NOT resolve scale** - it only outputs unit translation vectors
- Scale resolution is handled by F10 Factor Graph Optimizer, which uses:
- Altitude priors (soft constraints)
- GSD-based expected displacement calculations (via H02)
- Absolute GPS anchors from F09 Metric Refinement
**Error Conditions**:
- Returns `None`: Insufficient inliers (< 8 points for Essential Matrix)
**Test Cases**:
1. **Good matches**: Returns motion with high inlier count
2. **Low inliers**: May return None
3. **Degenerate motion**: Handles pure rotation
---
### `compute_relative_pose_in_chunk(prev_image: np.ndarray, curr_image: np.ndarray, chunk_id: str) -> Optional[RelativePose]`
**Description**: Computes relative camera pose between consecutive frames within a chunk context.
**Called By**:
- F02 Flight Processor (chunk-aware processing)
**Input**:
```python
prev_image: np.ndarray # Previous frame (t-1)
curr_image: np.ndarray # Current frame (t)
chunk_id: str # Chunk identifier for context
```
**Output**:
```python
RelativePose:
translation: np.ndarray # (x, y, z) in meters
rotation: np.ndarray # 3×3 rotation matrix or quaternion
confidence: float # 0.0 to 1.0
inlier_count: int
total_matches: int
tracking_good: bool
chunk_id: str # Chunk context
```
**Processing Flow**:
1. Same as compute_relative_pose() (SuperPoint + LightGlue)
2. Return RelativePose with chunk_id context
3. Factor will be added to chunk's subgraph (not global graph)
**Chunk Context**:
- VO operations are chunk-scoped
- Factors added to chunk's subgraph via F10.add_relative_factor_to_chunk()
- Chunk isolation ensures independent optimization
**Test Cases**:
1. **Chunk-aware VO**: Returns RelativePose with chunk_id
2. **Chunk isolation**: Factors isolated to chunk
3. **Multiple chunks**: VO operations don't interfere between chunks
## Integration Tests
### Test 1: Normal Flight Sequence
1. Load consecutive frames with 50% overlap
2. compute_relative_pose() → returns valid pose
3. Verify translation direction reasonable
4. Verify inlier_count > 100
### Test 2: Low Overlap Scenario
1. Load frames with 5% overlap
2. compute_relative_pose() → still succeeds
3. Verify inlier_count > 20
4. Verify LightGlue finds matches despite low overlap
### Test 3: Tracking Loss
1. Load frames with 0% overlap (sharp turn)
2. compute_relative_pose() → returns None
3. Verify tracking_good = False
4. Trigger global place recognition
### Test 4: Agricultural Texture
1. Load images of wheat fields (repetitive texture)
2. compute_relative_pose() → SuperPoint handles better than SIFT
3. Verify match quality
### Test 5: Chunk-Aware VO
1. Create chunk_1 and chunk_2
2. compute_relative_pose_in_chunk() for frames in chunk_1
3. compute_relative_pose_in_chunk() for frames in chunk_2
4. Verify factors added to respective chunks
5. Verify chunks optimized independently
## Non-Functional Requirements
### Performance
- **compute_relative_pose**: < 200ms total
- SuperPoint extraction: ~15ms × 2 = 30ms
- LightGlue matching: ~50ms
- Motion estimation: ~10ms
- **Frame rate**: 5-10 FPS processing (meets <5s requirement)
### Accuracy
- **Relative rotation**: ±2° error
- **Relative translation direction**: ±5° error
- **Inlier ratio**: >50% for good tracking
### Reliability
- Handle 100m spacing between frames
- Survive temporary tracking degradation
- Recover from brief occlusions
## Dependencies
### Internal Components
- **F16 Model Manager**: For SuperPoint and LightGlue models
- **F17 Configuration Manager**: For camera parameters
- **H01 Camera Model**: For coordinate normalization
- **H05 Performance Monitor**: For timing measurements
- **F10 Factor Graph Optimizer**: For chunk-scoped factor addition
### External Dependencies
- **SuperPoint**: Feature extraction model
- **LightGlue**: Feature matching model
- **opencv-python**: Essential Matrix estimation
- **numpy**: Matrix operations
## Data Models
### Features
```python
class Features(BaseModel):
keypoints: np.ndarray # (N, 2)
descriptors: np.ndarray # (N, 256)
scores: np.ndarray # (N,)
```
### Matches
```python
class Matches(BaseModel):
matches: np.ndarray # (M, 2) - pairs of indices
scores: np.ndarray # (M,) - match confidence
keypoints1: np.ndarray # (M, 2)
keypoints2: np.ndarray # (M, 2)
```
### RelativePose
```python
class RelativePose(BaseModel):
translation: np.ndarray # (3,) - unit vector
rotation: np.ndarray # (3, 3) or (4,) quaternion
confidence: float
inlier_count: int
total_matches: int
tracking_good: bool
scale_ambiguous: bool = True
chunk_id: Optional[str] = None # Chunk context (if chunk-aware)
```
### Motion
```python
class Motion(BaseModel):
translation: np.ndarray # (3,)
rotation: np.ndarray # (3, 3)
inliers: np.ndarray # Boolean mask
inlier_count: int
```