initial structure implemented

docs -> _docs
2026-06-21 21:11:11 +00:00 · 2025-12-01 14:20:56 +02:00
parent 9134c5db06
commit abc26d5c20
360 changed files with 3881 additions and 101 deletions
@@ -0,0 +1,129 @@
+# Feature: Combined Neural Inference
+
+## Description
+Single-pass SuperPoint+LightGlue TensorRT inference for feature extraction and matching. Takes two images as input and outputs matched keypoints directly, eliminating intermediate feature transfer overhead.
+
+## Component APIs Implemented
+- `extract_and_match(image1: np.ndarray, image2: np.ndarray) -> Matches`
+
+## External Tools and Services
+- **Combined SuperPoint+LightGlue TensorRT Engine**: Single model combining extraction and matching
+- **F16 Model Manager**: Provides pre-loaded TensorRT engine instance
+- **Reference**: [D_VINS](https://github.com/kajo-kurisu/D_VINS/) for TensorRT optimization patterns
+
+## Internal Methods
+
+### `_preprocess_images(image1: np.ndarray, image2: np.ndarray) -> Tuple[np.ndarray, np.ndarray]`
+Converts images to grayscale if needed, normalizes pixel values, resizes to model input dimensions.
+
+### `_run_combined_inference(img1_tensor: np.ndarray, img2_tensor: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]`
+Executes combined TensorRT engine. Returns matched keypoints from both images and match confidence scores.
+
+**Internal Pipeline** (within single inference):
+1. SuperPoint extracts keypoints + descriptors from both images
+2. LightGlue performs attention-based matching with adaptive depth
+3. Dustbin mechanism filters unmatched features
+4. Returns only matched keypoint pairs
+
+### `_filter_matches_by_confidence(keypoints1: np.ndarray, keypoints2: np.ndarray, scores: np.ndarray, threshold: float) -> Matches`
+Filters low-confidence matches, constructs final Matches object.
+
+## Architecture Notes
+
+### Combined Model Benefits
+- Single GPU memory transfer (both images together)
+- No intermediate descriptor serialization
+- Optimized attention layers for batch processing
+- Adaptive depth exits early for easy (high-overlap) pairs
+
+### TensorRT Engine Configuration
+```
+Input shapes:
+  image1: (1, 1, H, W) - grayscale
+  image2: (1, 1, H, W) - grayscale
+  
+Output shapes:
+  keypoints1: (1, M, 2) - matched keypoints from image1
+  keypoints2: (1, M, 2) - matched keypoints from image2
+  scores: (1, M) - match confidence scores
+```
+
+### Model Export (reference from D_VINS)
+```bash
+trtexec --onnx='superpoint_lightglue_combined.onnx' \
+  --fp16 \
+  --minShapes=image1:1x1x480x752,image2:1x1x480x752 \
+  --optShapes=image1:1x1x480x752,image2:1x1x480x752 \
+  --maxShapes=image1:1x1x480x752,image2:1x1x480x752 \
+  --saveEngine=sp_lg_combined.engine \
+  --warmUp=500 --duration=10
+```
+
+## Unit Tests
+
+### Test: Grayscale Conversion
+- Input: Two RGB images (H×W×3)
+- Verify: _preprocess_images returns grayscale tensors
+
+### Test: Grayscale Passthrough
+- Input: Two grayscale images (H×W)
+- Verify: _preprocess_images returns unchanged
+
+### Test: High Overlap Matching
+- Input: Two images with >50% overlap
+- Verify: Returns 500+ matches
+- Verify: Inference time ~35-50ms (adaptive depth fast path)
+
+### Test: Low Overlap Matching
+- Input: Two images with 5-10% overlap
+- Verify: Returns 20-50 matches
+- Verify: Inference time ~80ms (full depth)
+
+### Test: No Overlap Handling
+- Input: Two non-overlapping images
+- Verify: Returns <10 matches
+- Verify: No exception raised
+
+### Test: Confidence Score Range
+- Input: Any valid image pair
+- Verify: All scores in [0, 1] range
+
+### Test: Empty/Invalid Image Handling
+- Input: Black/invalid image pair
+- Verify: Returns empty Matches (never raises exception)
+
+### Test: High Resolution Images
+- Input: Two 6252×4168 images
+- Verify: Preprocessing resizes appropriately
+- Verify: Completes within performance budget
+
+### Test: Output Shape Consistency
+- Input: Any valid image pair
+- Verify: keypoints1.shape[0] == keypoints2.shape[0] == scores.shape[0]
+
+## Integration Tests
+
+### Test: Model Manager Integration
+- Verify: Successfully retrieves combined SP+LG engine from F16
+- Verify: Engine loaded with correct TensorRT backend
+
+### Test: Performance Budget
+- Input: Two FullHD images
+- Verify: Combined inference completes in <80ms on RTX 2060
+
+### Test: Adaptive Depth Behavior
+- Input: High overlap pair, then low overlap pair
+- Verify: High overlap completes faster than low overlap
+
+### Test: Agricultural Texture Handling
+- Input: Two wheat field images with repetitive patterns
+- Verify: Produces valid matches despite repetitive textures
+
+### Test: Memory Efficiency
+- Verify: Single GPU memory allocation vs two separate models
+- Verify: No intermediate descriptor buffer allocation
+
+### Test: Batch Consistency
+- Input: Same image pair multiple times
+- Verify: Consistent match results (deterministic)
+
@@ -0,0 +1,133 @@
+# Feature: Geometric Pose Estimation
+
+## Description
+Estimates camera motion from matched keypoints using Essential Matrix decomposition. Orchestrates the full visual odometry pipeline and provides tracking quality assessment. Pure geometric computation (non-ML).
+
+## Component APIs Implemented
+- `compute_relative_pose(prev_image: np.ndarray, curr_image: np.ndarray) -> Optional[RelativePose]`
+- `estimate_motion(matches: Matches, camera_params: CameraParameters) -> Optional[Motion]`
+
+## External Tools and Services
+- **opencv-python**: Essential Matrix estimation via RANSAC, matrix decomposition
+- **numpy**: Matrix operations, coordinate normalization
+- **F17 Configuration Manager**: Camera parameters (focal length, principal point)
+- **H01 Camera Model**: Coordinate normalization utilities
+
+## Internal Methods
+
+### `_normalize_keypoints(keypoints: np.ndarray, camera_params: CameraParameters) -> np.ndarray`
+Normalizes pixel coordinates to camera-centered coordinates using intrinsic matrix K.
+```
+normalized = K^(-1) @ [x, y, 1]^T
+```
+
+### `_estimate_essential_matrix(points1: np.ndarray, points2: np.ndarray) -> Tuple[Optional[np.ndarray], np.ndarray]`
+RANSAC-based Essential Matrix estimation. Returns E matrix and inlier mask.
+- Uses cv2.findEssentialMat with RANSAC
+- Requires minimum 8 point correspondences
+- Returns None if insufficient inliers
+
+### `_decompose_essential_matrix(E: np.ndarray, points1: np.ndarray, points2: np.ndarray, camera_params: CameraParameters) -> Tuple[np.ndarray, np.ndarray]`
+Decomposes Essential Matrix into rotation R and translation t.
+- Uses cv2.recoverPose
+- Selects correct solution via cheirality check
+- Translation is unit vector (scale ambiguous)
+
+### `_compute_tracking_quality(inlier_count: int, total_matches: int) -> Tuple[float, bool]`
+Computes confidence score and tracking_good flag:
+- **Good**: inlier_count > 50, inlier_ratio > 0.5 → confidence > 0.8
+- **Degraded**: inlier_count 20-50 → confidence 0.4-0.8
+- **Lost**: inlier_count < 20 → tracking_good = False
+
+### `_build_relative_pose(motion: Motion, matches: Matches) -> RelativePose`
+Constructs RelativePose dataclass from motion estimate and match statistics.
+
+## Unit Tests
+
+### Test: Keypoint Normalization
+- Input: Pixel coordinates and camera params (fx=1000, cx=640, cy=360)
+- Verify: Output centered at principal point, scaled by focal length
+
+### Test: Essential Matrix Estimation - Good Data
+- Input: 100+ inlier correspondences from known motion
+- Verify: Returns valid Essential Matrix
+- Verify: det(E) ≈ 0
+- Verify: Singular values satisfy 2σ₁ ≈ σ₂, σ₃ ≈ 0
+
+### Test: Essential Matrix Estimation - Insufficient Points
+- Input: <8 point correspondences
+- Verify: Returns None
+
+### Test: Essential Matrix Decomposition
+- Input: Valid Essential Matrix from known motion
+- Verify: Returns valid rotation (det(R) = 1, R^T R = I)
+- Verify: Translation is unit vector (||t|| = 1)
+
+### Test: Tracking Quality - Good
+- Input: inlier_count=100, total_matches=150
+- Verify: tracking_good=True
+- Verify: confidence > 0.8
+
+### Test: Tracking Quality - Degraded
+- Input: inlier_count=30, total_matches=50
+- Verify: tracking_good=True
+- Verify: 0.4 < confidence < 0.8
+
+### Test: Tracking Quality - Lost
+- Input: inlier_count=10, total_matches=20
+- Verify: tracking_good=False
+
+### Test: Scale Ambiguity Marker
+- Input: Any valid motion estimate
+- Verify: translation vector has unit norm (||t|| = 1)
+- Verify: scale_ambiguous flag is True
+
+### Test: Pure Rotation Handling
+- Input: Matches from pure rotational motion (no translation)
+- Verify: Returns valid pose
+- Verify: translation ≈ [0, 0, 0] or arbitrary unit vector
+
+### Test: Forward Motion
+- Input: Matches from forward camera motion
+- Verify: translation z-component is positive
+
+## Integration Tests
+
+### Test: Full Pipeline - Normal Flight
+- Input: Consecutive frames with 50% overlap
+- Verify: Returns valid RelativePose
+- Verify: inlier_count > 100
+- Verify: Total time < 150ms
+
+### Test: Full Pipeline - Low Overlap
+- Input: Frames with 5% overlap
+- Verify: Returns valid RelativePose
+- Verify: inlier_count > 20
+
+### Test: Full Pipeline - Tracking Loss
+- Input: Non-overlapping frames (sharp turn)
+- Verify: Returns None
+- Verify: No exception raised
+
+### Test: Configuration Manager Integration
+- Verify: Successfully retrieves camera_params from F17
+- Verify: Parameters match expected resolution and focal length
+
+### Test: Camera Model Integration
+- Verify: H01 normalization produces correct coordinates
+- Verify: Consistent with opencv undistortion
+
+### Test: Pipeline Orchestration
+- Verify: extract_and_match called once (combined inference)
+- Verify: estimate_motion called with correct params
+- Verify: Returns RelativePose with all fields populated
+
+### Test: Agricultural Environment
+- Input: Wheat field images with repetitive texture
+- Verify: Pipeline succeeds with reasonable inlier count
+
+### Test: Known Motion Validation
+- Input: Synthetic image pair with known ground truth motion
+- Verify: Estimated rotation within ±2° of ground truth
+- Verify: Estimated translation direction within ±5° of ground truth
+
@@ -0,0 +1,301 @@
+# Sequential Visual Odometry
+
+## Interface Definition
+
+**Interface Name**: `ISequentialVisualOdometry`
+
+### Interface Methods
+
+```python
+class ISequentialVisualOdometry(ABC):
+    @abstractmethod
+    def compute_relative_pose(self, prev_image: np.ndarray, curr_image: np.ndarray) -> Optional[RelativePose]:
+        pass
+    
+    @abstractmethod
+    def extract_and_match(self, image1: np.ndarray, image2: np.ndarray) -> Matches:
+        pass
+    
+    @abstractmethod
+    def estimate_motion(self, matches: Matches, camera_params: CameraParameters) -> Optional[Motion]:
+        pass
+```
+
+**Note**: F07 is chunk-agnostic. It only computes relative poses between images. The caller (F02.2 Flight Processing Engine) determines which chunk the frames belong to and routes factors to the appropriate subgraph via F12 → F10.
+
+## Component Description
+
+### Responsibilities
+- Combined SuperPoint+LightGlue neural network inference for extraction and matching
+- Handle <5% overlap scenarios via LightGlue attention mechanism
+- Estimate relative pose (translation + rotation) between frames
+- Return relative pose factors for Factor Graph Optimizer
+- Detect tracking loss (low inlier count)
+
+### Scope
+- Frame-to-frame visual odometry
+- Feature-based motion estimation using combined neural network
+- Handles low overlap and challenging agricultural environments
+- Provides relative measurements for trajectory optimization
+- **Chunk-agnostic**: F07 doesn't know about chunks. Caller (F02.2) routes results to appropriate chunk subgraph.
+
+### Architecture Notes
+- Uses combined SuperPoint+LightGlue TensorRT model (single inference pass)
+- Reference: [D_VINS](https://github.com/kajo-kurisu/D_VINS/) for TensorRT optimization patterns
+- Model outputs matched keypoints directly from two input images
+
+## API Methods
+
+### `compute_relative_pose(prev_image: np.ndarray, curr_image: np.ndarray) -> Optional[RelativePose]`
+
+**Description**: Computes relative camera pose between consecutive frames.
+
+**Called By**:
+- Main processing loop (per-frame)
+
+**Input**:
+```python
+prev_image: np.ndarray  # Previous frame (t-1)
+curr_image: np.ndarray  # Current frame (t)
+```
+
+**Output**:
+```python
+RelativePose:
+    translation: np.ndarray  # (x, y, z) in meters
+    rotation: np.ndarray  # 3×3 rotation matrix or quaternion
+    confidence: float  # 0.0 to 1.0
+    inlier_count: int
+    total_matches: int
+    tracking_good: bool
+```
+
+**Processing Flow**:
+1. extract_and_match(prev_image, curr_image) → matches
+2. estimate_motion(matches, camera_params) → motion
+3. Return RelativePose
+
+**Tracking Quality Indicators**:
+- **Good tracking**: inlier_count > 50, inlier_ratio > 0.5
+- **Degraded tracking**: inlier_count 20-50
+- **Tracking loss**: inlier_count < 20
+
+**Error Conditions**:
+- Returns `None`: Tracking lost (insufficient matches)
+
+**Test Cases**:
+1. **Good overlap (>50%)**: Returns reliable pose
+2. **Low overlap (5-10%)**: Still succeeds with LightGlue
+3. **<5% overlap**: May return None (tracking loss)
+4. **Agricultural texture**: Handles repetitive patterns
+
+---
+
+### `extract_and_match(image1: np.ndarray, image2: np.ndarray) -> Matches`
+
+**Description**: Single-pass neural network inference combining SuperPoint feature extraction and LightGlue matching.
+
+**Called By**:
+- Internal (during compute_relative_pose)
+
+**Input**:
+```python
+image1: np.ndarray  # First image (H×W×3 or H×W)
+image2: np.ndarray  # Second image (H×W×3 or H×W)
+```
+
+**Output**:
+```python
+Matches:
+    matches: np.ndarray  # (M, 2) - indices [idx1, idx2]
+    scores: np.ndarray  # (M,) - match confidence scores
+    keypoints1: np.ndarray  # (M, 2) - matched keypoints from image 1
+    keypoints2: np.ndarray  # (M, 2) - matched keypoints from image 2
+```
+
+**Processing Details**:
+- Uses F16 Model Manager to get combined SuperPoint+LightGlue TensorRT engine
+- Single inference pass processes both images
+- Converts to grayscale internally if needed
+- SuperPoint extracts keypoints + 256-dim descriptors
+- LightGlue performs attention-based matching with adaptive depth
+- "Dustbin" mechanism handles unmatched features
+
+**Performance** (TensorRT on RTX 2060):
+- Combined inference: ~50-80ms (vs ~65-115ms separate)
+- Faster for high-overlap pairs (adaptive depth exits early)
+
+**Test Cases**:
+1. **High overlap**: ~35ms, 500+ matches
+2. **Low overlap (<5%)**: ~80ms, 20-50 matches
+3. **No overlap**: Few or no matches (< 10)
+
+---
+
+### `estimate_motion(matches: Matches, camera_params: CameraParameters) -> Optional[Motion]`
+
+**Description**: Estimates camera motion from matched keypoints using Essential Matrix.
+
+**Called By**:
+- Internal (during compute_relative_pose)
+
+**Input**:
+```python
+matches: Matches
+camera_params: CameraParameters:
+    focal_length: float
+    principal_point: Tuple[float, float]
+    resolution: Tuple[int, int]
+```
+
+**Output**:
+```python
+Motion:
+    translation: np.ndarray  # (x, y, z) - unit vector (scale ambiguous)
+    rotation: np.ndarray  # 3×3 rotation matrix
+    inliers: np.ndarray  # Boolean mask of inlier matches
+    inlier_count: int
+```
+
+**Algorithm**:
+1. Normalize keypoint coordinates using camera intrinsics
+2. Estimate Essential Matrix using RANSAC
+3. Decompose Essential Matrix → [R, t]
+4. Return motion with inlier mask
+
+**Scale Ambiguity**:
+- Monocular VO has inherent scale ambiguity
+- Translation is unit vector (direction only, magnitude = 1)
+- **F07 does NOT resolve scale** - it only outputs unit translation vectors
+- Scale resolution is handled by F10 Factor Graph Optimizer, which uses:
+  - Altitude priors (soft constraints)
+  - GSD-based expected displacement calculations (via H02)
+  - Absolute GPS anchors from F09 Metric Refinement
+
+**Critical Handoff to F10**:
+The caller (F02.2) must pass the unit translation to F10 for scale resolution:
+```python
+vo_result = F07.compute_relative_pose(prev_image, curr_image)
+# vo_result.translation is a UNIT VECTOR (||t|| = 1)
+
+# F02.2 passes to F10 which scales using:
+# 1. altitude = F17.get_operational_altitude(flight_id)
+# 2. gsd = H02.compute_gsd(altitude, camera_params)
+# 3. expected_displacement = frame_spacing * gsd
+# 4. scaled_translation = vo_result.translation * expected_displacement
+F10.add_relative_factor(flight_id, frame_i, frame_j, vo_result, covariance)
+```
+
+**Error Conditions**:
+- Returns `None`: Insufficient inliers (< 8 points for Essential Matrix)
+
+**Test Cases**:
+1. **Good matches**: Returns motion with high inlier count
+2. **Low inliers**: May return None
+3. **Degenerate motion**: Handles pure rotation
+
+## Integration Tests
+
+### Test 1: Normal Flight Sequence
+1. Load consecutive frames with 50% overlap
+2. compute_relative_pose() → returns valid pose
+3. Verify translation direction reasonable
+4. Verify inlier_count > 100
+
+### Test 2: Low Overlap Scenario
+1. Load frames with 5% overlap
+2. compute_relative_pose() → still succeeds
+3. Verify inlier_count > 20
+4. Verify combined model finds matches despite low overlap
+
+### Test 3: Tracking Loss
+1. Load frames with 0% overlap (sharp turn)
+2. compute_relative_pose() → returns None
+3. Verify tracking_good = False
+4. Trigger global place recognition
+
+### Test 4: Agricultural Texture
+1. Load images of wheat fields (repetitive texture)
+2. compute_relative_pose() → SuperPoint handles better than SIFT
+3. Verify match quality
+
+### Test 5: VO with Chunk Routing
+1. Create chunk_1 and chunk_2 via F12
+2. compute_relative_pose() for frames in chunk_1, F02.2 routes to chunk_1
+3. compute_relative_pose() for frames in chunk_2, F02.2 routes to chunk_2
+4. Verify F02.2 calls F12.add_frame_to_chunk() with correct chunk_id
+5. Verify chunks optimized independently via F10
+
+## Non-Functional Requirements
+
+### Performance
+- **compute_relative_pose**: < 150ms total
+  - Combined SP+LG inference: ~50-80ms
+  - Motion estimation: ~10ms
+- **Frame rate**: 5-10 FPS processing (meets <5s requirement)
+
+### Accuracy
+- **Relative rotation**: ±2° error
+- **Relative translation direction**: ±5° error
+- **Inlier ratio**: >50% for good tracking
+
+### Reliability
+- Handle 100m spacing between frames
+- Survive temporary tracking degradation
+- Recover from brief occlusions
+
+## Dependencies
+
+### Internal Components
+- **F16 Model Manager**: For combined SuperPoint+LightGlue TensorRT model
+- **F17 Configuration Manager**: For camera parameters
+- **H01 Camera Model**: For coordinate normalization
+- **H05 Performance Monitor**: For timing measurements
+
+**Note**: F07 is chunk-agnostic and does NOT depend on F10 Factor Graph Optimizer. F07 only computes relative poses between images and returns them to the caller (F02.2). The caller (F02.2) determines which chunk the frames belong to and routes factors to the appropriate subgraph via F12 → F10.
+
+### External Dependencies
+- **Combined SuperPoint+LightGlue**: Single TensorRT engine for extraction + matching
+- **opencv-python**: Essential Matrix estimation
+- **numpy**: Matrix operations
+
+## Data Models
+
+### Matches
+```python
+class Matches(BaseModel):
+    matches: np.ndarray  # (M, 2) - pairs of indices
+    scores: np.ndarray  # (M,) - match confidence
+    keypoints1: np.ndarray  # (M, 2)
+    keypoints2: np.ndarray  # (M, 2)
+```
+
+### RelativePose
+```python
+class RelativePose(BaseModel):
+    translation: np.ndarray  # (3,) - unit vector
+    rotation: np.ndarray  # (3, 3) or (4,) quaternion
+    confidence: float
+    inlier_count: int
+    total_matches: int
+    tracking_good: bool
+    scale_ambiguous: bool = True
+    chunk_id: Optional[str] = None
+```
+
+### Motion
+```python
+class Motion(BaseModel):
+    translation: np.ndarray  # (3,)
+    rotation: np.ndarray  # (3, 3)
+    inliers: np.ndarray  # Boolean mask
+    inlier_count: int
+```
+
+### CameraParameters
+```python
+class CameraParameters(BaseModel):
+    focal_length: float
+    principal_point: Tuple[float, float]
+    resolution: Tuple[int, int]
+```