name components correctly

update tutorial with 3. implementation phase add implementation commands
2026-06-22 10:21:13 +00:00 · 2025-12-01 12:56:43 +02:00
parent 54be35fde7
commit 66360d255e
34 changed files with 340 additions and 648 deletions
@@ -1,41 +0,0 @@
-# Feature: Tile Cache Management
-
-## Description
-Manages persistent disk-based caching of satellite tiles with flight-specific organization. Provides storage, retrieval, and cleanup of cached tiles to minimize redundant API calls and enable offline access to prefetched data.
-
-## Component APIs Implemented
- `cache_tile(flight_id: str, tile_coords: TileCoords, tile_data: np.ndarray) -> bool`
- `get_cached_tile(flight_id: str, tile_coords: TileCoords) -> Optional[np.ndarray]`
- `clear_flight_cache(flight_id: str) -> bool`
-
-## External Tools and Services
- **diskcache**: Persistent cache library for disk storage management
- **opencv-python**: Image serialization (PNG encoding/decoding)
- **numpy**: Image array handling
-
-## Internal Methods
- `_generate_cache_path(flight_id: str, tile_coords: TileCoords) -> Path`: Generates cache file path following pattern `/satellite_cache/{flight_id}/{zoom}/{tile_x}_{tile_y}.png`
- `_ensure_cache_directory(flight_id: str, zoom: int) -> bool`: Creates cache directory structure if not exists
- `_serialize_tile(tile_data: np.ndarray) -> bytes`: Encodes tile array to PNG bytes
- `_deserialize_tile(data: bytes) -> Optional[np.ndarray]`: Decodes PNG bytes to tile array
- `_update_cache_index(flight_id: str, tile_coords: TileCoords, action: str) -> None`: Updates cache index for tracking
- `_check_global_cache(tile_coords: TileCoords) -> Optional[np.ndarray]`: Fallback lookup in shared cache
-
-## Unit Tests
-1. **cache_tile_success**: Cache new tile → file created at correct path
-2. **cache_tile_overwrite**: Cache existing tile → file updated
-3. **cache_tile_disk_error**: Simulate disk full → returns False
-4. **get_cached_tile_hit**: Tile exists → returns np.ndarray
-5. **get_cached_tile_miss**: Tile not exists → returns None
-6. **get_cached_tile_corrupted**: Invalid file → returns None, logs warning
-7. **get_cached_tile_global_fallback**: Not in flight cache, found in global → returns tile
-8. **clear_flight_cache_success**: Flight with tiles → all files removed
-9. **clear_flight_cache_nonexistent**: No such flight → returns True (no-op)
-10. **cache_path_generation**: Various tile coords → correct paths generated
-
-## Integration Tests
-1. **cache_round_trip**: cache_tile() then get_cached_tile() → returns identical data
-2. **multi_flight_isolation**: Cache tiles for flight A and B → each retrieves only own tiles
-3. **clear_does_not_affect_others**: Clear flight A → flight B cache intact
-4. **large_cache_handling**: Cache 1000 tiles → all retrievable
-
@@ -1,44 +0,0 @@
-# Feature: Tile Coordinate Operations
-
-## Description
-Handles all tile coordinate calculations including GPS-to-tile conversion, tile grid computation, and grid expansion for progressive search. Delegates core Web Mercator projection math to H06 Web Mercator Utils to maintain single source of truth.
-
-## Component APIs Implemented
- `compute_tile_coords(lat: float, lon: float, zoom: int) -> TileCoords`
- `compute_tile_bounds(tile_coords: TileCoords) -> TileBounds`
- `get_tile_grid(center: TileCoords, grid_size: int) -> List[TileCoords]`
- `expand_search_grid(center: TileCoords, current_size: int, new_size: int) -> List[TileCoords]`
-
-## External Tools and Services
-None (pure computation, delegates to H06)
-
-## Internal Dependencies
- **H06 Web Mercator Utils**: Core projection calculations
-  - `H06.latlon_to_tile()` for coordinate conversion
-  - `H06.compute_tile_bounds()` for bounding box calculation
-
-## Internal Methods
- `_compute_grid_offset(grid_size: int) -> int`: Calculates offset from center for symmetric grid (e.g., 3×3 → offset 1)
- `_grid_size_to_dimensions(grid_size: int) -> Tuple[int, int]`: Maps grid_size (1,4,9,16,25) to (rows, cols)
- `_generate_grid_tiles(center: TileCoords, rows: int, cols: int) -> List[TileCoords]`: Generates all tile coords in grid
-
-## Unit Tests
-1. **compute_tile_coords_ukraine**: Ukraine GPS coords at zoom 19 → valid tile coords
-2. **compute_tile_coords_origin**: lat=0, lon=0 → correct center tile
-3. **compute_tile_coords_edge_cases**: lat=90, lon=180, lon=-180 → handled correctly
-4. **compute_tile_bounds_zoom19**: Zoom 19 tile → GSD ≈ 0.3 m/pixel
-5. **compute_tile_bounds_corners**: Returns valid GPS for all 4 corners
-6. **get_tile_grid_1**: grid_size=1 → returns [center]
-7. **get_tile_grid_4**: grid_size=4 → returns 4 tiles (2×2)
-8. **get_tile_grid_9**: grid_size=9 → returns 9 tiles (3×3) centered
-9. **get_tile_grid_25**: grid_size=25 → returns 25 tiles (5×5)
-10. **expand_search_grid_1_to_4**: Returns 3 new tiles only
-11. **expand_search_grid_4_to_9**: Returns 5 new tiles only
-12. **expand_search_grid_9_to_16**: Returns 7 new tiles only
-13. **expand_search_grid_no_duplicates**: Expanded tiles not in original set
-
-## Integration Tests
-1. **h06_delegation_verify**: compute_tile_coords() result matches direct H06.latlon_to_tile()
-2. **grid_bounds_coverage**: get_tile_grid(9) → all 9 tile bounds form contiguous area
-3. **expand_completes_grid**: get_tile_grid(4) + expand_search_grid(4,9) == get_tile_grid(9)
-
@@ -1,51 +0,0 @@
-# Feature: Tile Fetching
-
-## Description
-Handles HTTP-based satellite tile retrieval from external provider API with multiple fetching patterns: single tile, grid, progressive expansion, and route corridor prefetching. Integrates with cache for performance optimization and supports parallel fetching for throughput.
-
-## Component APIs Implemented
- `fetch_tile(lat: float, lon: float, zoom: int) -> Optional[np.ndarray]`
- `fetch_tile_grid(center_lat: float, center_lon: float, grid_size: int, zoom: int) -> Dict[str, np.ndarray]`
- `prefetch_route_corridor(waypoints: List[GPSPoint], corridor_width_m: float, zoom: int) -> bool`
- `progressive_fetch(center_lat: float, center_lon: float, grid_sizes: List[int], zoom: int) -> Iterator[Dict[str, np.ndarray]]`
-
-## External Tools and Services
- **Satellite Provider API**: HTTP tile source (`GET /api/satellite/tiles/latlon`)
- **httpx** or **requests**: HTTP client with async support
- **numpy**: Image array handling
-
-## Internal Dependencies
- **01_feature_tile_cache_management**: cache_tile, get_cached_tile
- **02_feature_tile_coordinate_operations**: compute_tile_coords, get_tile_grid
-
-## Internal Methods
- `_fetch_from_api(tile_coords: TileCoords) -> Optional[np.ndarray]`: HTTP GET to satellite provider, handles response parsing
- `_fetch_with_retry(tile_coords: TileCoords, max_retries: int = 3) -> Optional[np.ndarray]`: Wraps _fetch_from_api with retry logic
- `_fetch_tiles_parallel(tiles: List[TileCoords], max_concurrent: int = 20) -> Dict[str, np.ndarray]`: Parallel fetching with connection pooling
- `_compute_corridor_tiles(waypoints: List[GPSPoint], corridor_width_m: float, zoom: int) -> List[TileCoords]`: Calculates tiles covering route corridor polygon
- `_generate_tile_id(tile_coords: TileCoords) -> str`: Creates unique tile identifier string
-
-## Unit Tests
-1. **fetch_tile_cache_hit**: Tile in cache → returns immediately, no HTTP call
-2. **fetch_tile_cache_miss**: Not cached → HTTP fetch, cache, return
-3. **fetch_tile_api_error**: HTTP 500 → returns None
-4. **fetch_tile_invalid_coords**: Invalid GPS → returns None
-5. **fetch_tile_retry_success**: First attempt fails, second succeeds → returns tile
-6. **fetch_tile_retry_exhausted**: All 3 attempts fail → returns None
-7. **fetch_tile_grid_2x2**: grid_size=4 → returns dict with 4 tiles
-8. **fetch_tile_grid_3x3**: grid_size=9 → returns dict with 9 tiles
-9. **fetch_tile_grid_partial_failure**: 2 of 9 tiles fail → returns 7 tiles
-10. **fetch_tile_grid_all_cached**: All tiles cached → no HTTP calls
-11. **prefetch_route_corridor_success**: 10 waypoints → prefetches tiles, returns True
-12. **prefetch_route_corridor_partial_failure**: Some tiles fail → continues, returns True
-13. **prefetch_route_corridor_complete_failure**: All tiles fail → returns False
-14. **progressive_fetch_yields_sequence**: [1,4,9] → yields 3 dicts in order
-15. **progressive_fetch_early_termination**: Break after 4 → doesn't fetch 9,16,25
-
-## Integration Tests
-1. **fetch_and_cache_verify**: fetch_tile() → get_cached_tile() returns same data
-2. **progressive_search_simulation**: progressive_fetch with simulated match on grid 9
-3. **grid_expansion_no_refetch**: fetch_tile_grid(4) then expand → no duplicate fetches
-4. **corridor_prefetch_coverage**: prefetch_route_corridor → all corridor tiles cached
-5. **concurrent_fetch_stress**: Fetch 100 tiles in parallel → all complete within timeout
-
@@ -1,63 +0,0 @@
-# Feature: Feature Extraction
-
-## Description
-SuperPoint-based keypoint and descriptor extraction from UAV images. Provides the foundation for visual odometry by detecting repeatable keypoints and computing discriminative 256-dimensional descriptors.
-
-## Component APIs Implemented
- `extract_features(image: np.ndarray) -> Features`
-
-## External Tools and Services
- **SuperPoint**: Neural network model for keypoint detection and descriptor extraction
- **F16 Model Manager**: Provides pre-loaded SuperPoint model instance
-
-## Internal Methods
-
-### `_preprocess_image(image: np.ndarray) -> np.ndarray`
-Converts image to grayscale if needed, normalizes pixel values for model input.
-
-### `_run_superpoint_inference(preprocessed: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]`
-Executes SuperPoint model inference, returns raw keypoints, descriptors, and scores.
-
-### `_apply_nms(keypoints: np.ndarray, scores: np.ndarray, nms_radius: int) -> np.ndarray`
-Non-maximum suppression to filter keypoints, typically keeps 500-2000 keypoints per image.
-
-## Unit Tests
-
-### Test: Grayscale Conversion
- Input: RGB image (H×W×3)
- Verify: _preprocess_image returns grayscale (H×W)
-
-### Test: Grayscale Passthrough
- Input: Already grayscale image (H×W)
- Verify: _preprocess_image returns unchanged
-
-### Test: Feature Count Range
- Input: Standard UAV image
- Verify: Returns 500-2000 keypoints
-
-### Test: Descriptor Dimensions
- Input: Any valid image
- Verify: Descriptors shape is (N, 256)
-
-### Test: Empty Image Handling
- Input: Black/invalid image
- Verify: Returns empty Features (never raises exception)
-
-### Test: High Resolution Image
- Input: 6252×4168 image
- Verify: Extracts ~2000 keypoints within performance budget
-
-### Test: Low Texture Image
- Input: Uniform texture (sky, water)
- Verify: Returns fewer keypoints gracefully
-
-## Integration Tests
-
-### Test: Model Manager Integration
- Verify: Successfully retrieves SuperPoint model from F16
- Verify: Model loaded with correct TensorRT/ONNX backend
-
-### Test: Performance Budget
- Input: FullHD image
- Verify: Extraction completes in <15ms on RTX 2060
-
@@ -1,73 +0,0 @@
-# Feature: Feature Matching
-
-## Description
-LightGlue-based attention matching between feature sets from consecutive frames. Handles challenging low-overlap scenarios (<5%) using transformer-based attention mechanism with adaptive depth.
-
-## Component APIs Implemented
- `match_features(features1: Features, features2: Features) -> Matches`
-
-## External Tools and Services
- **LightGlue**: Transformer-based feature matcher with adaptive depth
- **F16 Model Manager**: Provides pre-loaded LightGlue model instance
-
-## Internal Methods
-
-### `_prepare_features_for_lightglue(features: Features) -> Dict`
-Formats Features dataclass into LightGlue-compatible tensor format.
-
-### `_run_lightglue_inference(features1_dict: Dict, features2_dict: Dict) -> Tuple[np.ndarray, np.ndarray]`
-Executes LightGlue inference, returns match indices and confidence scores. Uses adaptive depth (exits early for easy matches).
-
-### `_filter_matches_by_confidence(matches: np.ndarray, scores: np.ndarray, threshold: float) -> Tuple[np.ndarray, np.ndarray]`
-Filters matches below confidence threshold (dustbin mechanism).
-
-### `_extract_matched_keypoints(features1: Features, features2: Features, match_indices: np.ndarray) -> Tuple[np.ndarray, np.ndarray]`
-Extracts matched keypoint coordinates from both feature sets using match indices.
-
-## Unit Tests
-
-### Test: High Overlap Matching
- Input: Features from frames with >50% overlap
- Verify: Returns 500+ matches
- Verify: Inference time ~35ms (fast path)
-
-### Test: Low Overlap Matching
- Input: Features from frames with 5-10% overlap
- Verify: Returns 20-50 matches
- Verify: Inference time ~100ms (full depth)
-
-### Test: No Overlap Handling
- Input: Features from non-overlapping frames
- Verify: Returns <10 matches
- Verify: No exception raised
-
-### Test: Match Index Validity
- Input: Any valid feature pairs
- Verify: All match indices within valid range for both feature sets
-
-### Test: Confidence Score Range
- Input: Any valid feature pairs
- Verify: All scores in [0, 1] range
-
-### Test: Empty Features Handling
- Input: Empty Features object
- Verify: Returns empty Matches (no exception)
-
-### Test: Matched Keypoints Extraction
- Input: Features and match indices
- Verify: keypoints1 and keypoints2 arrays have same length as matches
-
-## Integration Tests
-
-### Test: Model Manager Integration
- Verify: Successfully retrieves LightGlue model from F16
- Verify: Model compatible with SuperPoint descriptors
-
-### Test: Adaptive Depth Behavior
- Input: High overlap pair, then low overlap pair
- Verify: High overlap completes faster than low overlap
-
-### Test: Agricultural Texture Handling
- Input: Features from repetitive wheat field images
- Verify: Produces valid matches despite repetitive patterns
-
@@ -0,0 +1,129 @@
+# Feature: Combined Neural Inference
+
+## Description
+Single-pass SuperPoint+LightGlue TensorRT inference for feature extraction and matching. Takes two images as input and outputs matched keypoints directly, eliminating intermediate feature transfer overhead.
+
+## Component APIs Implemented
+- `extract_and_match(image1: np.ndarray, image2: np.ndarray) -> Matches`
+
+## External Tools and Services
+- **Combined SuperPoint+LightGlue TensorRT Engine**: Single model combining extraction and matching
+- **F16 Model Manager**: Provides pre-loaded TensorRT engine instance
+- **Reference**: [D_VINS](https://github.com/kajo-kurisu/D_VINS/) for TensorRT optimization patterns
+
+## Internal Methods
+
+### `_preprocess_images(image1: np.ndarray, image2: np.ndarray) -> Tuple[np.ndarray, np.ndarray]`
+Converts images to grayscale if needed, normalizes pixel values, resizes to model input dimensions.
+
+### `_run_combined_inference(img1_tensor: np.ndarray, img2_tensor: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]`
+Executes combined TensorRT engine. Returns matched keypoints from both images and match confidence scores.
+
+**Internal Pipeline** (within single inference):
+1. SuperPoint extracts keypoints + descriptors from both images
+2. LightGlue performs attention-based matching with adaptive depth
+3. Dustbin mechanism filters unmatched features
+4. Returns only matched keypoint pairs
+
+### `_filter_matches_by_confidence(keypoints1: np.ndarray, keypoints2: np.ndarray, scores: np.ndarray, threshold: float) -> Matches`
+Filters low-confidence matches, constructs final Matches object.
+
+## Architecture Notes
+
+### Combined Model Benefits
+- Single GPU memory transfer (both images together)
+- No intermediate descriptor serialization
+- Optimized attention layers for batch processing
+- Adaptive depth exits early for easy (high-overlap) pairs
+
+### TensorRT Engine Configuration
+```
+Input shapes:
+  image1: (1, 1, H, W) - grayscale
+  image2: (1, 1, H, W) - grayscale
+  
+Output shapes:
+  keypoints1: (1, M, 2) - matched keypoints from image1
+  keypoints2: (1, M, 2) - matched keypoints from image2
+  scores: (1, M) - match confidence scores
+```
+
+### Model Export (reference from D_VINS)
+```bash
+trtexec --onnx='superpoint_lightglue_combined.onnx' \
+  --fp16 \
+  --minShapes=image1:1x1x480x752,image2:1x1x480x752 \
+  --optShapes=image1:1x1x480x752,image2:1x1x480x752 \
+  --maxShapes=image1:1x1x480x752,image2:1x1x480x752 \
+  --saveEngine=sp_lg_combined.engine \
+  --warmUp=500 --duration=10
+```
+
+## Unit Tests
+
+### Test: Grayscale Conversion
+- Input: Two RGB images (H×W×3)
+- Verify: _preprocess_images returns grayscale tensors
+
+### Test: Grayscale Passthrough
+- Input: Two grayscale images (H×W)
+- Verify: _preprocess_images returns unchanged
+
+### Test: High Overlap Matching
+- Input: Two images with >50% overlap
+- Verify: Returns 500+ matches
+- Verify: Inference time ~35-50ms (adaptive depth fast path)
+
+### Test: Low Overlap Matching
+- Input: Two images with 5-10% overlap
+- Verify: Returns 20-50 matches
+- Verify: Inference time ~80ms (full depth)
+
+### Test: No Overlap Handling
+- Input: Two non-overlapping images
+- Verify: Returns <10 matches
+- Verify: No exception raised
+
+### Test: Confidence Score Range
+- Input: Any valid image pair
+- Verify: All scores in [0, 1] range
+
+### Test: Empty/Invalid Image Handling
+- Input: Black/invalid image pair
+- Verify: Returns empty Matches (never raises exception)
+
+### Test: High Resolution Images
+- Input: Two 6252×4168 images
+- Verify: Preprocessing resizes appropriately
+- Verify: Completes within performance budget
+
+### Test: Output Shape Consistency
+- Input: Any valid image pair
+- Verify: keypoints1.shape[0] == keypoints2.shape[0] == scores.shape[0]
+
+## Integration Tests
+
+### Test: Model Manager Integration
+- Verify: Successfully retrieves combined SP+LG engine from F16
+- Verify: Engine loaded with correct TensorRT backend
+
+### Test: Performance Budget
+- Input: Two FullHD images
+- Verify: Combined inference completes in <80ms on RTX 2060
+
+### Test: Adaptive Depth Behavior
+- Input: High overlap pair, then low overlap pair
+- Verify: High overlap completes faster than low overlap
+
+### Test: Agricultural Texture Handling
+- Input: Two wheat field images with repetitive patterns
+- Verify: Produces valid matches despite repetitive textures
+
+### Test: Memory Efficiency
+- Verify: Single GPU memory allocation vs two separate models
+- Verify: No intermediate descriptor buffer allocation
+
+### Test: Batch Consistency
+- Input: Same image pair multiple times
+- Verify: Consistent match results (deterministic)
+
@@ -1,63 +0,0 @@
-# Feature: Feature Extraction
-
-## Description
-SuperPoint-based keypoint and descriptor extraction from UAV images. Provides the foundation for visual odometry by detecting repeatable keypoints and computing discriminative 256-dimensional descriptors.
-
-## Component APIs Implemented
- `extract_features(image: np.ndarray) -> Features`
-
-## External Tools and Services
- **SuperPoint**: Neural network model for keypoint detection and descriptor extraction
- **F16 Model Manager**: Provides pre-loaded SuperPoint model instance
-
-## Internal Methods
-
-### `_preprocess_image(image: np.ndarray) -> np.ndarray`
-Converts image to grayscale if needed, normalizes pixel values for model input.
-
-### `_run_superpoint_inference(preprocessed: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]`
-Executes SuperPoint model inference, returns raw keypoints, descriptors, and scores.
-
-### `_apply_nms(keypoints: np.ndarray, scores: np.ndarray, nms_radius: int) -> np.ndarray`
-Non-maximum suppression to filter keypoints, typically keeps 500-2000 keypoints per image.
-
-## Unit Tests
-
-### Test: Grayscale Conversion
- Input: RGB image (H×W×3)
- Verify: _preprocess_image returns grayscale (H×W)
-
-### Test: Grayscale Passthrough
- Input: Already grayscale image (H×W)
- Verify: _preprocess_image returns unchanged
-
-### Test: Feature Count Range
- Input: Standard UAV image
- Verify: Returns 500-2000 keypoints
-
-### Test: Descriptor Dimensions
- Input: Any valid image
- Verify: Descriptors shape is (N, 256)
-
-### Test: Empty Image Handling
- Input: Black/invalid image
- Verify: Returns empty Features (never raises exception)
-
-### Test: High Resolution Image
- Input: 6252×4168 image
- Verify: Extracts ~2000 keypoints within performance budget
-
-### Test: Low Texture Image
- Input: Uniform texture (sky, water)
- Verify: Returns fewer keypoints gracefully
-
-## Integration Tests
-
-### Test: Model Manager Integration
- Verify: Successfully retrieves SuperPoint model from F16
- Verify: Model loaded with correct TensorRT/ONNX backend
-
-### Test: Performance Budget
- Input: FullHD image
- Verify: Extraction completes in <15ms on RTX 2060
-
@@ -1,73 +0,0 @@
-# Feature: Feature Matching
-
-## Description
-LightGlue-based attention matching between feature sets from consecutive frames. Handles challenging low-overlap scenarios (<5%) using transformer-based attention mechanism with adaptive depth.
-
-## Component APIs Implemented
- `match_features(features1: Features, features2: Features) -> Matches`
-
-## External Tools and Services
- **LightGlue**: Transformer-based feature matcher with adaptive depth
- **F16 Model Manager**: Provides pre-loaded LightGlue model instance
-
-## Internal Methods
-
-### `_prepare_features_for_lightglue(features: Features) -> Dict`
-Formats Features dataclass into LightGlue-compatible tensor format.
-
-### `_run_lightglue_inference(features1_dict: Dict, features2_dict: Dict) -> Tuple[np.ndarray, np.ndarray]`
-Executes LightGlue inference, returns match indices and confidence scores. Uses adaptive depth (exits early for easy matches).
-
-### `_filter_matches_by_confidence(matches: np.ndarray, scores: np.ndarray, threshold: float) -> Tuple[np.ndarray, np.ndarray]`
-Filters matches below confidence threshold (dustbin mechanism).
-
-### `_extract_matched_keypoints(features1: Features, features2: Features, match_indices: np.ndarray) -> Tuple[np.ndarray, np.ndarray]`
-Extracts matched keypoint coordinates from both feature sets using match indices.
-
-## Unit Tests
-
-### Test: High Overlap Matching
- Input: Features from frames with >50% overlap
- Verify: Returns 500+ matches
- Verify: Inference time ~35ms (fast path)
-
-### Test: Low Overlap Matching
- Input: Features from frames with 5-10% overlap
- Verify: Returns 20-50 matches
- Verify: Inference time ~100ms (full depth)
-
-### Test: No Overlap Handling
- Input: Features from non-overlapping frames
- Verify: Returns <10 matches
- Verify: No exception raised
-
-### Test: Match Index Validity
- Input: Any valid feature pairs
- Verify: All match indices within valid range for both feature sets
-
-### Test: Confidence Score Range
- Input: Any valid feature pairs
- Verify: All scores in [0, 1] range
-
-### Test: Empty Features Handling
- Input: Empty Features object
- Verify: Returns empty Matches (no exception)
-
-### Test: Matched Keypoints Extraction
- Input: Features and match indices
- Verify: keypoints1 and keypoints2 arrays have same length as matches
-
-## Integration Tests
-
-### Test: Model Manager Integration
- Verify: Successfully retrieves LightGlue model from F16
- Verify: Model compatible with SuperPoint descriptors
-
-### Test: Adaptive Depth Behavior
- Input: High overlap pair, then low overlap pair
- Verify: High overlap completes faster than low overlap
-
-### Test: Agricultural Texture Handling
- Input: Features from repetitive wheat field images
- Verify: Produces valid matches despite repetitive patterns
-
@@ -1,7 +1,7 @@
-# Feature: Relative Pose Computation
+# Feature: Geometric Pose Estimation

 ## Description
-Orchestrates the full visual odometry pipeline and estimates camera motion from matched features using Essential Matrix decomposition. Computes relative pose between consecutive frames and provides tracking quality indicators.
+Estimates camera motion from matched keypoints using Essential Matrix decomposition. Orchestrates the full visual odometry pipeline and provides tracking quality assessment. Pure geometric computation (non-ML).

 ## Component APIs Implemented
 - `compute_relative_pose(prev_image: np.ndarray, curr_image: np.ndarray) -> Optional[RelativePose]`
@@ -16,19 +16,28 @@ Orchestrates the full visual odometry pipeline and estimates camera motion from
 ## Internal Methods

 ### `_normalize_keypoints(keypoints: np.ndarray, camera_params: CameraParameters) -> np.ndarray`
-Normalizes pixel coordinates to camera-centered coordinates using intrinsic matrix.
+Normalizes pixel coordinates to camera-centered coordinates using intrinsic matrix K.
+```
+normalized = K^(-1) @ [x, y, 1]^T
+```

-### `_estimate_essential_matrix(points1: np.ndarray, points2: np.ndarray) -> Tuple[np.ndarray, np.ndarray]`
-RANSAC-based Essential Matrix estimation, returns E matrix and inlier mask.
+### `_estimate_essential_matrix(points1: np.ndarray, points2: np.ndarray) -> Tuple[Optional[np.ndarray], np.ndarray]`
+RANSAC-based Essential Matrix estimation. Returns E matrix and inlier mask.
+- Uses cv2.findEssentialMat with RANSAC
+- Requires minimum 8 point correspondences
+- Returns None if insufficient inliers

-### `_decompose_essential_matrix(E: np.ndarray, points1: np.ndarray, points2: np.ndarray) -> Tuple[np.ndarray, np.ndarray]`
-Decomposes Essential Matrix into rotation R and translation t (unit vector).
+### `_decompose_essential_matrix(E: np.ndarray, points1: np.ndarray, points2: np.ndarray, camera_params: CameraParameters) -> Tuple[np.ndarray, np.ndarray]`
+Decomposes Essential Matrix into rotation R and translation t.
+- Uses cv2.recoverPose
+- Selects correct solution via cheirality check
+- Translation is unit vector (scale ambiguous)

 ### `_compute_tracking_quality(inlier_count: int, total_matches: int) -> Tuple[float, bool]`
-Computes confidence score and tracking_good flag based on inlier statistics.
- Good: inlier_count > 50, inlier_ratio > 0.5
- Degraded: inlier_count 20-50
- Lost: inlier_count < 20
+Computes confidence score and tracking_good flag:
+- **Good**: inlier_count > 50, inlier_ratio > 0.5 → confidence > 0.8
+- **Degraded**: inlier_count 20-50 → confidence 0.4-0.8
+- **Lost**: inlier_count < 20 → tracking_good = False

 ### `_build_relative_pose(motion: Motion, matches: Matches) -> RelativePose`
 Constructs RelativePose dataclass from motion estimate and match statistics.
@@ -36,41 +45,51 @@ Constructs RelativePose dataclass from motion estimate and match statistics.
 ## Unit Tests

 ### Test: Keypoint Normalization
- Input: Pixel coordinates and camera params
+- Input: Pixel coordinates and camera params (fx=1000, cx=640, cy=360)
 - Verify: Output centered at principal point, scaled by focal length

 ### Test: Essential Matrix Estimation - Good Data
- Input: 100+ inlier correspondences
- Verify: Returns valid Essential Matrix (det ≈ 0, singular values ratio)
+- Input: 100+ inlier correspondences from known motion
+- Verify: Returns valid Essential Matrix
+- Verify: det(E) ≈ 0
+- Verify: Singular values satisfy 2σ₁ ≈ σ₂, σ₃ ≈ 0

 ### Test: Essential Matrix Estimation - Insufficient Points
 - Input: <8 point correspondences
 - Verify: Returns None

 ### Test: Essential Matrix Decomposition
- Input: Valid Essential Matrix
- Verify: Returns valid rotation (det = 1) and unit translation
+- Input: Valid Essential Matrix from known motion
+- Verify: Returns valid rotation (det(R) = 1, R^T R = I)
+- Verify: Translation is unit vector (||t|| = 1)

 ### Test: Tracking Quality - Good
 - Input: inlier_count=100, total_matches=150
- Verify: tracking_good=True, confidence>0.5
+- Verify: tracking_good=True
+- Verify: confidence > 0.8

 ### Test: Tracking Quality - Degraded
 - Input: inlier_count=30, total_matches=50
- Verify: tracking_good=True, confidence reduced
+- Verify: tracking_good=True
+- Verify: 0.4 < confidence < 0.8

 ### Test: Tracking Quality - Lost
 - Input: inlier_count=10, total_matches=20
 - Verify: tracking_good=False

-### Test: Scale Ambiguity
+### Test: Scale Ambiguity Marker
 - Input: Any valid motion estimate
 - Verify: translation vector has unit norm (||t|| = 1)
 - Verify: scale_ambiguous flag is True

 ### Test: Pure Rotation Handling
- Input: Matches from pure rotational motion
- Verify: Returns valid pose (translation ≈ 0)
+- Input: Matches from pure rotational motion (no translation)
+- Verify: Returns valid pose
+- Verify: translation ≈ [0, 0, 0] or arbitrary unit vector
+
+### Test: Forward Motion
+- Input: Matches from forward camera motion
+- Verify: translation z-component is positive

 ## Integration Tests

@@ -78,7 +97,7 @@ Constructs RelativePose dataclass from motion estimate and match statistics.
 - Input: Consecutive frames with 50% overlap
 - Verify: Returns valid RelativePose
 - Verify: inlier_count > 100
- Verify: Total time < 200ms
+- Verify: Total time < 150ms

 ### Test: Full Pipeline - Low Overlap
 - Input: Frames with 5% overlap
@@ -88,7 +107,7 @@ Constructs RelativePose dataclass from motion estimate and match statistics.
 ### Test: Full Pipeline - Tracking Loss
 - Input: Non-overlapping frames (sharp turn)
 - Verify: Returns None
- Verify: tracking_good would be False
+- Verify: No exception raised

 ### Test: Configuration Manager Integration
 - Verify: Successfully retrieves camera_params from F17
@@ -99,11 +118,16 @@ Constructs RelativePose dataclass from motion estimate and match statistics.
 - Verify: Consistent with opencv undistortion

 ### Test: Pipeline Orchestration
- Verify: extract_features called twice (prev, curr)
- Verify: match_features called once
+- Verify: extract_and_match called once (combined inference)
 - Verify: estimate_motion called with correct params
+- Verify: Returns RelativePose with all fields populated

 ### Test: Agricultural Environment
 - Input: Wheat field images with repetitive texture
 - Verify: Pipeline succeeds with reasonable inlier count

+### Test: Known Motion Validation
+- Input: Synthetic image pair with known ground truth motion
+- Verify: Estimated rotation within ±2° of ground truth
+- Verify: Estimated translation direction within ±5° of ground truth
+
@@ -1,109 +0,0 @@
-# Feature: Relative Pose Computation
-
-## Description
-Orchestrates the full visual odometry pipeline and estimates camera motion from matched features using Essential Matrix decomposition. Computes relative pose between consecutive frames and provides tracking quality indicators.
-
-## Component APIs Implemented
- `compute_relative_pose(prev_image: np.ndarray, curr_image: np.ndarray) -> Optional[RelativePose]`
- `estimate_motion(matches: Matches, camera_params: CameraParameters) -> Optional[Motion]`
-
-## External Tools and Services
- **opencv-python**: Essential Matrix estimation via RANSAC, matrix decomposition
- **numpy**: Matrix operations, coordinate normalization
- **F17 Configuration Manager**: Camera parameters (focal length, principal point)
- **H01 Camera Model**: Coordinate normalization utilities
-
-## Internal Methods
-
-### `_normalize_keypoints(keypoints: np.ndarray, camera_params: CameraParameters) -> np.ndarray`
-Normalizes pixel coordinates to camera-centered coordinates using intrinsic matrix.
-
-### `_estimate_essential_matrix(points1: np.ndarray, points2: np.ndarray) -> Tuple[np.ndarray, np.ndarray]`
-RANSAC-based Essential Matrix estimation, returns E matrix and inlier mask.
-
-### `_decompose_essential_matrix(E: np.ndarray, points1: np.ndarray, points2: np.ndarray) -> Tuple[np.ndarray, np.ndarray]`
-Decomposes Essential Matrix into rotation R and translation t (unit vector).
-
-### `_compute_tracking_quality(inlier_count: int, total_matches: int) -> Tuple[float, bool]`
-Computes confidence score and tracking_good flag based on inlier statistics.
- Good: inlier_count > 50, inlier_ratio > 0.5
- Degraded: inlier_count 20-50
- Lost: inlier_count < 20
-
-### `_build_relative_pose(motion: Motion, matches: Matches) -> RelativePose`
-Constructs RelativePose dataclass from motion estimate and match statistics.
-
-## Unit Tests
-
-### Test: Keypoint Normalization
- Input: Pixel coordinates and camera params
- Verify: Output centered at principal point, scaled by focal length
-
-### Test: Essential Matrix Estimation - Good Data
- Input: 100+ inlier correspondences
- Verify: Returns valid Essential Matrix (det ≈ 0, singular values ratio)
-
-### Test: Essential Matrix Estimation - Insufficient Points
- Input: <8 point correspondences
- Verify: Returns None
-
-### Test: Essential Matrix Decomposition
- Input: Valid Essential Matrix
- Verify: Returns valid rotation (det = 1) and unit translation
-
-### Test: Tracking Quality - Good
- Input: inlier_count=100, total_matches=150
- Verify: tracking_good=True, confidence>0.5
-
-### Test: Tracking Quality - Degraded
- Input: inlier_count=30, total_matches=50
- Verify: tracking_good=True, confidence reduced
-
-### Test: Tracking Quality - Lost
- Input: inlier_count=10, total_matches=20
- Verify: tracking_good=False
-
-### Test: Scale Ambiguity
- Input: Any valid motion estimate
- Verify: translation vector has unit norm (||t|| = 1)
- Verify: scale_ambiguous flag is True
-
-### Test: Pure Rotation Handling
- Input: Matches from pure rotational motion
- Verify: Returns valid pose (translation ≈ 0)
-
-## Integration Tests
-
-### Test: Full Pipeline - Normal Flight
- Input: Consecutive frames with 50% overlap
- Verify: Returns valid RelativePose
- Verify: inlier_count > 100
- Verify: Total time < 200ms
-
-### Test: Full Pipeline - Low Overlap
- Input: Frames with 5% overlap
- Verify: Returns valid RelativePose
- Verify: inlier_count > 20
-
-### Test: Full Pipeline - Tracking Loss
- Input: Non-overlapping frames (sharp turn)
- Verify: Returns None
- Verify: tracking_good would be False
-
-### Test: Configuration Manager Integration
- Verify: Successfully retrieves camera_params from F17
- Verify: Parameters match expected resolution and focal length
-
-### Test: Camera Model Integration
- Verify: H01 normalization produces correct coordinates
- Verify: Consistent with opencv undistortion
-
-### Test: Pipeline Orchestration
- Verify: extract_features called twice (prev, curr)
- Verify: match_features called once
- Verify: estimate_motion called with correct params
-
-### Test: Agricultural Environment
- Input: Wheat field images with repetitive texture
- Verify: Pipeline succeeds with reasonable inlier count
-
@@ -13,11 +13,7 @@ class ISequentialVisualOdometry(ABC):
        pass
    
    @abstractmethod
-    def extract_features(self, image: np.ndarray) -> Features:
-        pass
-    
-    @abstractmethod
-    def match_features(self, features1: Features, features2: Features) -> Matches:
+    def extract_and_match(self, image1: np.ndarray, image2: np.ndarray) -> Matches:
        pass
    
    @abstractmethod
@@ -30,20 +26,24 @@ class ISequentialVisualOdometry(ABC):
 ## Component Description

 ### Responsibilities
- SuperPoint feature extraction from UAV images
- LightGlue feature matching between consecutive frames
- Handle <5% overlap scenarios
+- Combined SuperPoint+LightGlue neural network inference for extraction and matching
+- Handle <5% overlap scenarios via LightGlue attention mechanism
 - Estimate relative pose (translation + rotation) between frames
 - Return relative pose factors for Factor Graph Optimizer
 - Detect tracking loss (low inlier count)

 ### Scope
 - Frame-to-frame visual odometry
- Feature-based motion estimation
+- Feature-based motion estimation using combined neural network
 - Handles low overlap and challenging agricultural environments
 - Provides relative measurements for trajectory optimization
 - **Chunk-agnostic**: F07 doesn't know about chunks. Caller (F02.2) routes results to appropriate chunk subgraph.

+### Architecture Notes
+- Uses combined SuperPoint+LightGlue TensorRT model (single inference pass)
+- Reference: [D_VINS](https://github.com/kajo-kurisu/D_VINS/) for TensorRT optimization patterns
+- Model outputs matched keypoints directly from two input images
+
 ## API Methods

 ### `compute_relative_pose(prev_image: np.ndarray, curr_image: np.ndarray) -> Optional[RelativePose]`
@@ -71,11 +71,9 @@ RelativePose:
 ```

 **Processing Flow**:
-1. extract_features(prev_image) → features1
-2. extract_features(curr_image) → features2
-3. match_features(features1, features2) → matches
-4. estimate_motion(matches, camera_params) → motion
-5. Return RelativePose
+1. extract_and_match(prev_image, curr_image) → matches
+2. estimate_motion(matches, camera_params) → motion
+3. Return RelativePose

 **Tracking Quality Indicators**:
 - **Good tracking**: inlier_count > 50, inlier_ratio > 0.5
@@ -93,57 +91,17 @@ RelativePose:

 ---

-### `extract_features(image: np.ndarray) -> Features`
+### `extract_and_match(image1: np.ndarray, image2: np.ndarray) -> Matches`

-**Description**: Extracts SuperPoint keypoints and descriptors from image.
-
-**Called By**:
- Internal (during compute_relative_pose)
- F08 Global Place Recognition (for descriptor caching)
-
-**Input**:
-```python
-image: np.ndarray  # Input image (H×W×3 or H×W)
-```
-
-**Output**:
-```python
-Features:
-    keypoints: np.ndarray  # (N, 2) - (x, y) coordinates
-    descriptors: np.ndarray  # (N, 256) - 256-dim descriptors
-    scores: np.ndarray  # (N,) - detection confidence scores
-```
-
-**Processing Details**:
- Uses F16 Model Manager to get SuperPoint model
- Converts to grayscale if needed
- Non-maximum suppression for keypoint selection
- Typically extracts 500-2000 keypoints per image
-
-**Performance**:
- Inference time: ~15ms with TensorRT on RTX 2060
-
-**Error Conditions**:
- Never fails (returns empty features if image invalid)
-
-**Test Cases**:
-1. **FullHD image**: Extracts ~1000 keypoints
-2. **High-res image (6252×4168)**: Extracts ~2000 keypoints
-3. **Low-texture image**: Extracts fewer keypoints
-
---
-
-### `match_features(features1: Features, features2: Features) -> Matches`
-
-**Description**: Matches features using LightGlue attention-based matcher.
+**Description**: Single-pass neural network inference combining SuperPoint feature extraction and LightGlue matching.

 **Called By**:
 - Internal (during compute_relative_pose)

 **Input**:
 ```python
-features1: Features  # Previous frame features
-features2: Features  # Current frame features
+image1: np.ndarray  # First image (H×W×3 or H×W)
+image2: np.ndarray  # Second image (H×W×3 or H×W)
 ```

 **Output**:
@@ -151,24 +109,25 @@ features2: Features  # Current frame features
 Matches:
    matches: np.ndarray  # (M, 2) - indices [idx1, idx2]
    scores: np.ndarray  # (M,) - match confidence scores
-    keypoints1: np.ndarray  # (M, 2) - matched keypoints from frame 1
-    keypoints2: np.ndarray  # (M, 2) - matched keypoints from frame 2
+    keypoints1: np.ndarray  # (M, 2) - matched keypoints from image 1
+    keypoints2: np.ndarray  # (M, 2) - matched keypoints from image 2
 ```

 **Processing Details**:
- Uses F16 Model Manager to get LightGlue model
- Transformer-based attention mechanism
- "Dustbin" mechanism for unmatched features
- Adaptive depth (exits early for easy matches)
- **Critical**: Handles <5% overlap better than RANSAC
+- Uses F16 Model Manager to get combined SuperPoint+LightGlue TensorRT engine
+- Single inference pass processes both images
+- Converts to grayscale internally if needed
+- SuperPoint extracts keypoints + 256-dim descriptors
+- LightGlue performs attention-based matching with adaptive depth
+- "Dustbin" mechanism handles unmatched features

-**Performance**:
- Inference time: ~35-100ms (adaptive depth)
- Faster for high-overlap, slower for low-overlap
+**Performance** (TensorRT on RTX 2060):
+- Combined inference: ~50-80ms (vs ~65-115ms separate)
+- Faster for high-overlap pairs (adaptive depth exits early)

 **Test Cases**:
-1. **High overlap**: Fast matching (~35ms), 500+ matches
-2. **Low overlap (<5%)**: Slower (~100ms), 20-50 matches
+1. **High overlap**: ~35ms, 500+ matches
+2. **Low overlap (<5%)**: ~80ms, 20-50 matches
 3. **No overlap**: Few or no matches (< 10)

 ---
@@ -216,7 +175,6 @@ Motion:
 **Critical Handoff to F10**:
 The caller (F02.2) must pass the unit translation to F10 for scale resolution:
 ```python
-# F02.2 receives RelativePose from F07
 vo_result = F07.compute_relative_pose(prev_image, curr_image)
 # vo_result.translation is a UNIT VECTOR (||t|| = 1)

@@ -248,7 +206,7 @@ F10.add_relative_factor(flight_id, frame_i, frame_j, vo_result, covariance)
 1. Load frames with 5% overlap
 2. compute_relative_pose() → still succeeds
 3. Verify inlier_count > 20
-4. Verify LightGlue finds matches despite low overlap
+4. Verify combined model finds matches despite low overlap

 ### Test 3: Tracking Loss
 1. Load frames with 0% overlap (sharp turn)
@@ -271,9 +229,8 @@ F10.add_relative_factor(flight_id, frame_i, frame_j, vo_result, covariance)
 ## Non-Functional Requirements

 ### Performance
- **compute_relative_pose**: < 200ms total
-  - SuperPoint extraction: ~15ms × 2 = 30ms
-  - LightGlue matching: ~50ms
+- **compute_relative_pose**: < 150ms total
+  - Combined SP+LG inference: ~50-80ms
  - Motion estimation: ~10ms
 - **Frame rate**: 5-10 FPS processing (meets <5s requirement)

@@ -290,7 +247,7 @@ F10.add_relative_factor(flight_id, frame_i, frame_j, vo_result, covariance)
 ## Dependencies

 ### Internal Components
- **F16 Model Manager**: For SuperPoint and LightGlue models
+- **F16 Model Manager**: For combined SuperPoint+LightGlue TensorRT model
 - **F17 Configuration Manager**: For camera parameters
 - **H01 Camera Model**: For coordinate normalization
 - **H05 Performance Monitor**: For timing measurements
@@ -298,21 +255,12 @@ F10.add_relative_factor(flight_id, frame_i, frame_j, vo_result, covariance)
 **Note**: F07 is chunk-agnostic and does NOT depend on F10 Factor Graph Optimizer. F07 only computes relative poses between images and returns them to the caller (F02.2). The caller (F02.2) determines which chunk the frames belong to and routes factors to the appropriate subgraph via F12 → F10.

 ### External Dependencies
- **SuperPoint**: Feature extraction model
- **LightGlue**: Feature matching model
+- **Combined SuperPoint+LightGlue**: Single TensorRT engine for extraction + matching
 - **opencv-python**: Essential Matrix estimation
 - **numpy**: Matrix operations

 ## Data Models

-### Features
-```python
-class Features(BaseModel):
-    keypoints: np.ndarray  # (N, 2)
-    descriptors: np.ndarray  # (N, 256)
-    scores: np.ndarray  # (N,)
-```
-
 ### Matches
 ```python
 class Matches(BaseModel):
@@ -332,7 +280,7 @@ class RelativePose(BaseModel):
    total_matches: int
    tracking_good: bool
    scale_ambiguous: bool = True
-    chunk_id: Optional[str] = None  # Chunk context (if chunk-aware)
+    chunk_id: Optional[str] = None
 ```

 ### Motion
@@ -344,3 +292,10 @@ class Motion(BaseModel):
    inlier_count: int
 ```

+### CameraParameters
+```python
+class CameraParameters(BaseModel):
+    focal_length: float
+    principal_point: Tuple[float, float]
+    resolution: Tuple[int, int]
+```
@@ -1,5 +1,6 @@
 # 1 Research Phase

+
 ## 1.0 Problem statement
 ### Discuss
  Discuss the problem and create in the `docs/00_problem` next files and folders:  
@@ -85,6 +86,7 @@

 # 2. Planning phase

+
 ## 2.10 **🤖📋AI plan**: Generate components

  ### Execute `/2.planning/2.10_gen_components`
@@ -104,17 +106,18 @@
  ### Revise 
   - Clarify the proposals and ask to fix found issues
  
+
 ## 2.20 **🤖AI agent**: Generate Jira Epics
  
-  ### Add Jira MCP to IDE and authenticate
+  ### Jira MCP
+   Add Jira MCP to the list in IDE:
   ```
   "Jira-MCP-Server": {
      "url": "https://mcp.atlassian.com/v1/sse"
   }
   ```
-  
  ### Execute `/2.planning/2.20_gen_epics use jira mcp`
-
+  
  ### Revise 
   - Revise the epics, answer questions, put detailed descriptions
   - Make sure epics are coherent and make sense
@@ -128,10 +131,11 @@
   - Revise the tests, answer questions, put detailed descriptions
   - Make sure stored tests are coherent and make sense

+
 ## 2.40 **🤖📋AI agent**: Component Decomposition To Features
  ### Execute
   For each component in `docs/02_components` run 
-    `/2.planning/2.40_gen_features --component @xx__spec_[component_name].md`  
+   `/2.planning/2.40_gen_features --component @xx__spec_[component_name].md`

  ### Revise 
   - Revise the features, answer questions, put detailed descriptions
@@ -141,19 +145,35 @@

 # 3. Development phase

-## 3.1 **🤖📋AI plan**: Feature implementation
- For each component in `docs/02_components/[##]_[component_name]/` folder do next:  
-``` 
- Read component description `@docs/02_components/[##]_[component_name]/spec.md`.  
- Read all features in the folder `@docs/02_components/[##]_[component_name]`. For each feature do next:
-  - Implement it
-  - Make sure feature is connected and communicated properly with other features and existing code
-  - Create unit tests from the Test cases description, run it and make sure the result is a success
-  - Create integration test for the feature, run and make sure the result is a success
- If integration tests are specified in component spec, then write them and run, and make sure that component working correctly
-```

-## 3.2 **🤖AI agent**: Solution composition and integration tests
+## 3.05 **🤖AI agent**: Initial structure
+  ### Execute: `/3.implementation/3.05_implement_initial_structure`
+  
+  ### Review
+   - Analyze the code, ask to do some adjustments if needed
+
+
+## 3.10 **🤖📋AI plan**: Feature implementation
+  ### Execute
+   For each component in `docs/02_components` run
+   `/3.implementation/3.10_implement_component @component_folder`
+  
+  ### Revise Plan
+   - Analyze the proposed development plan in a great detail, provide all necessary information
+   - Possibly reorganize plan if needed, think and add more input constraints if needed
+   - Improve plan as much as possible so it would be clear what exactly to do
+  
+  ### Save Plan
+   - when plan is final and ready, save it as `[##]._plan_[component_name]` to component's folder
+  
+  ### Execute Plan
+   - Press build and let AI generate the code
+
+  ### Revise Code
+   - Read the code and check that everything is ok
+
+   
+## 3.20 **🤖AI agent**: Solution composition and integration tests
 ``` 
 Read all the files here `docs/03_tests/` and for each file write down tests and run it.  
 Compose a final test results in a csv with the next format:
@@ -166,4 +186,6 @@
 Repeat test cycle until no failed tests.
 ```

+
+
 # 4. Refactoring phase