add chunking

2026-06-21 20:21:16 +00:00 · 2025-11-27 03:43:19 +02:00
parent 4f8c18a066
commit 2037870f67
43 changed files with 7041 additions and 4135 deletions
@@ -0,0 +1,376 @@
+# Sequential Visual Odometry
+
+## Interface Definition
+
+**Interface Name**: `ISequentialVO`
+
+### Interface Methods
+
+```python
+class ISequentialVO(ABC):
+    @abstractmethod
+    def compute_relative_pose(self, prev_image: np.ndarray, curr_image: np.ndarray) -> Optional[RelativePose]:
+        pass
+    
+    @abstractmethod
+    def extract_features(self, image: np.ndarray) -> Features:
+        pass
+    
+    @abstractmethod
+    def match_features(self, features1: Features, features2: Features) -> Matches:
+        pass
+    
+    @abstractmethod
+    def estimate_motion(self, matches: Matches, camera_params: CameraParameters) -> Optional[Motion]:
+        pass
+    
+    @abstractmethod
+    def compute_relative_pose_in_chunk(self, prev_image: np.ndarray, curr_image: np.ndarray, chunk_id: str) -> Optional[RelativePose]:
+        pass
+```
+
+## Component Description
+
+### Responsibilities
+- SuperPoint feature extraction from UAV images
+- LightGlue feature matching between consecutive frames
+- Handle <5% overlap scenarios
+- Estimate relative pose (translation + rotation) between frames
+- Return relative pose factors for Factor Graph Optimizer
+- Detect tracking loss (low inlier count)
+- **Chunk-aware VO operations (factors added to chunk subgraph)**
+
+### Scope
+- Frame-to-frame visual odometry
+- Feature-based motion estimation
+- Handles low overlap and challenging agricultural environments
+- Provides relative measurements for trajectory optimization
+- **Chunk-scoped operations (Atlas multi-map architecture)**
+
+## API Methods
+
+### `compute_relative_pose(prev_image: np.ndarray, curr_image: np.ndarray) -> Optional[RelativePose]`
+
+**Description**: Computes relative camera pose between consecutive frames.
+
+**Called By**:
+- Main processing loop (per-frame)
+
+**Input**:
+```python
+prev_image: np.ndarray  # Previous frame (t-1)
+curr_image: np.ndarray  # Current frame (t)
+```
+
+**Output**:
+```python
+RelativePose:
+    translation: np.ndarray  # (x, y, z) in meters
+    rotation: np.ndarray  # 3×3 rotation matrix or quaternion
+    confidence: float  # 0.0 to 1.0
+    inlier_count: int
+    total_matches: int
+    tracking_good: bool
+```
+
+**Processing Flow**:
+1. extract_features(prev_image) → features1
+2. extract_features(curr_image) → features2
+3. match_features(features1, features2) → matches
+4. estimate_motion(matches, camera_params) → motion
+5. Return RelativePose
+
+**Tracking Quality Indicators**:
+- **Good tracking**: inlier_count > 50, inlier_ratio > 0.5
+- **Degraded tracking**: inlier_count 20-50
+- **Tracking loss**: inlier_count < 20
+
+**Error Conditions**:
+- Returns `None`: Tracking lost (insufficient matches)
+
+**Test Cases**:
+1. **Good overlap (>50%)**: Returns reliable pose
+2. **Low overlap (5-10%)**: Still succeeds with LightGlue
+3. **<5% overlap**: May return None (tracking loss)
+4. **Agricultural texture**: Handles repetitive patterns
+
+---
+
+### `extract_features(image: np.ndarray) -> Features`
+
+**Description**: Extracts SuperPoint keypoints and descriptors from image.
+
+**Called By**:
+- Internal (during compute_relative_pose)
+- F08 Global Place Recognition (for descriptor caching)
+
+**Input**:
+```python
+image: np.ndarray  # Input image (H×W×3 or H×W)
+```
+
+**Output**:
+```python
+Features:
+    keypoints: np.ndarray  # (N, 2) - (x, y) coordinates
+    descriptors: np.ndarray  # (N, 256) - 256-dim descriptors
+    scores: np.ndarray  # (N,) - detection confidence scores
+```
+
+**Processing Details**:
+- Uses F16 Model Manager to get SuperPoint model
+- Converts to grayscale if needed
+- Non-maximum suppression for keypoint selection
+- Typically extracts 500-2000 keypoints per image
+
+**Performance**:
+- Inference time: ~15ms with TensorRT on RTX 2060
+
+**Error Conditions**:
+- Never fails (returns empty features if image invalid)
+
+**Test Cases**:
+1. **FullHD image**: Extracts ~1000 keypoints
+2. **High-res image (6252×4168)**: Extracts ~2000 keypoints
+3. **Low-texture image**: Extracts fewer keypoints
+
+---
+
+### `match_features(features1: Features, features2: Features) -> Matches`
+
+**Description**: Matches features using LightGlue attention-based matcher.
+
+**Called By**:
+- Internal (during compute_relative_pose)
+
+**Input**:
+```python
+features1: Features  # Previous frame features
+features2: Features  # Current frame features
+```
+
+**Output**:
+```python
+Matches:
+    matches: np.ndarray  # (M, 2) - indices [idx1, idx2]
+    scores: np.ndarray  # (M,) - match confidence scores
+    keypoints1: np.ndarray  # (M, 2) - matched keypoints from frame 1
+    keypoints2: np.ndarray  # (M, 2) - matched keypoints from frame 2
+```
+
+**Processing Details**:
+- Uses F16 Model Manager to get LightGlue model
+- Transformer-based attention mechanism
+- "Dustbin" mechanism for unmatched features
+- Adaptive depth (exits early for easy matches)
+- **Critical**: Handles <5% overlap better than RANSAC
+
+**Performance**:
+- Inference time: ~35-100ms (adaptive depth)
+- Faster for high-overlap, slower for low-overlap
+
+**Test Cases**:
+1. **High overlap**: Fast matching (~35ms), 500+ matches
+2. **Low overlap (<5%)**: Slower (~100ms), 20-50 matches
+3. **No overlap**: Few or no matches (< 10)
+
+---
+
+### `estimate_motion(matches: Matches, camera_params: CameraParameters) -> Optional[Motion]`
+
+**Description**: Estimates camera motion from matched keypoints using Essential Matrix.
+
+**Called By**:
+- Internal (during compute_relative_pose)
+
+**Input**:
+```python
+matches: Matches
+camera_params: CameraParameters:
+    focal_length: float
+    principal_point: Tuple[float, float]
+    resolution: Tuple[int, int]
+```
+
+**Output**:
+```python
+Motion:
+    translation: np.ndarray  # (x, y, z) - unit vector (scale ambiguous)
+    rotation: np.ndarray  # 3×3 rotation matrix
+    inliers: np.ndarray  # Boolean mask of inlier matches
+    inlier_count: int
+```
+
+**Algorithm**:
+1. Normalize keypoint coordinates using camera intrinsics
+2. Estimate Essential Matrix using RANSAC
+3. Decompose Essential Matrix → [R, t]
+4. Return motion with inlier mask
+
+**Scale Ambiguity**:
+- Monocular VO has inherent scale ambiguity
+- Translation is unit vector (direction only, magnitude = 1)
+- **F07 does NOT resolve scale** - it only outputs unit translation vectors
+- Scale resolution is handled by F10 Factor Graph Optimizer, which uses:
+  - Altitude priors (soft constraints)
+  - GSD-based expected displacement calculations (via H02)
+  - Absolute GPS anchors from F09 Metric Refinement
+
+**Error Conditions**:
+- Returns `None`: Insufficient inliers (< 8 points for Essential Matrix)
+
+**Test Cases**:
+1. **Good matches**: Returns motion with high inlier count
+2. **Low inliers**: May return None
+3. **Degenerate motion**: Handles pure rotation
+
+---
+
+### `compute_relative_pose_in_chunk(prev_image: np.ndarray, curr_image: np.ndarray, chunk_id: str) -> Optional[RelativePose]`
+
+**Description**: Computes relative camera pose between consecutive frames within a chunk context.
+
+**Called By**:
+- F02 Flight Processor (chunk-aware processing)
+
+**Input**:
+```python
+prev_image: np.ndarray  # Previous frame (t-1)
+curr_image: np.ndarray  # Current frame (t)
+chunk_id: str  # Chunk identifier for context
+```
+
+**Output**:
+```python
+RelativePose:
+    translation: np.ndarray  # (x, y, z) in meters
+    rotation: np.ndarray  # 3×3 rotation matrix or quaternion
+    confidence: float  # 0.0 to 1.0
+    inlier_count: int
+    total_matches: int
+    tracking_good: bool
+    chunk_id: str  # Chunk context
+```
+
+**Processing Flow**:
+1. Same as compute_relative_pose() (SuperPoint + LightGlue)
+2. Return RelativePose with chunk_id context
+3. Factor will be added to chunk's subgraph (not global graph)
+
+**Chunk Context**:
+- VO operations are chunk-scoped
+- Factors added to chunk's subgraph via F10.add_relative_factor_to_chunk()
+- Chunk isolation ensures independent optimization
+
+**Test Cases**:
+1. **Chunk-aware VO**: Returns RelativePose with chunk_id
+2. **Chunk isolation**: Factors isolated to chunk
+3. **Multiple chunks**: VO operations don't interfere between chunks
+
+## Integration Tests
+
+### Test 1: Normal Flight Sequence
+1. Load consecutive frames with 50% overlap
+2. compute_relative_pose() → returns valid pose
+3. Verify translation direction reasonable
+4. Verify inlier_count > 100
+
+### Test 2: Low Overlap Scenario
+1. Load frames with 5% overlap
+2. compute_relative_pose() → still succeeds
+3. Verify inlier_count > 20
+4. Verify LightGlue finds matches despite low overlap
+
+### Test 3: Tracking Loss
+1. Load frames with 0% overlap (sharp turn)
+2. compute_relative_pose() → returns None
+3. Verify tracking_good = False
+4. Trigger global place recognition
+
+### Test 4: Agricultural Texture
+1. Load images of wheat fields (repetitive texture)
+2. compute_relative_pose() → SuperPoint handles better than SIFT
+3. Verify match quality
+
+### Test 5: Chunk-Aware VO
+1. Create chunk_1 and chunk_2
+2. compute_relative_pose_in_chunk() for frames in chunk_1
+3. compute_relative_pose_in_chunk() for frames in chunk_2
+4. Verify factors added to respective chunks
+5. Verify chunks optimized independently
+
+## Non-Functional Requirements
+
+### Performance
+- **compute_relative_pose**: < 200ms total
+  - SuperPoint extraction: ~15ms × 2 = 30ms
+  - LightGlue matching: ~50ms
+  - Motion estimation: ~10ms
+- **Frame rate**: 5-10 FPS processing (meets <5s requirement)
+
+### Accuracy
+- **Relative rotation**: ±2° error
+- **Relative translation direction**: ±5° error
+- **Inlier ratio**: >50% for good tracking
+
+### Reliability
+- Handle 100m spacing between frames
+- Survive temporary tracking degradation
+- Recover from brief occlusions
+
+## Dependencies
+
+### Internal Components
+- **F16 Model Manager**: For SuperPoint and LightGlue models
+- **F17 Configuration Manager**: For camera parameters
+- **H01 Camera Model**: For coordinate normalization
+- **H05 Performance Monitor**: For timing measurements
+- **F10 Factor Graph Optimizer**: For chunk-scoped factor addition
+
+### External Dependencies
+- **SuperPoint**: Feature extraction model
+- **LightGlue**: Feature matching model
+- **opencv-python**: Essential Matrix estimation
+- **numpy**: Matrix operations
+
+## Data Models
+
+### Features
+```python
+class Features(BaseModel):
+    keypoints: np.ndarray  # (N, 2)
+    descriptors: np.ndarray  # (N, 256)
+    scores: np.ndarray  # (N,)
+```
+
+### Matches
+```python
+class Matches(BaseModel):
+    matches: np.ndarray  # (M, 2) - pairs of indices
+    scores: np.ndarray  # (M,) - match confidence
+    keypoints1: np.ndarray  # (M, 2)
+    keypoints2: np.ndarray  # (M, 2)
+```
+
+### RelativePose
+```python
+class RelativePose(BaseModel):
+    translation: np.ndarray  # (3,) - unit vector
+    rotation: np.ndarray  # (3, 3) or (4,) quaternion
+    confidence: float
+    inlier_count: int
+    total_matches: int
+    tracking_good: bool
+    scale_ambiguous: bool = True
+    chunk_id: Optional[str] = None  # Chunk context (if chunk-aware)
+```
+
+### Motion
+```python
+class Motion(BaseModel):
+    translation: np.ndarray  # (3,)
+    rotation: np.ndarray  # (3, 3)
+    inliers: np.ndarray  # Boolean mask
+    inlier_count: int
+```
+