add chunking

2026-04-23 00:56:36 +00:00 · 2025-11-27 03:43:19 +02:00
parent 4f8c18a066
commit 2037870f67
43 changed files with 7041 additions and 4135 deletions
@@ -0,0 +1,415 @@
+# Global Place Recognition
+
+## Interface Definition
+
+**Interface Name**: `IGlobalPlaceRecognition`
+
+### Interface Methods
+
+```python
+class IGlobalPlaceRecognition(ABC):
+    @abstractmethod
+    def retrieve_candidate_tiles(self, image: np.ndarray, top_k: int) -> List[TileCandidate]:
+        pass
+    
+    @abstractmethod
+    def compute_location_descriptor(self, image: np.ndarray) -> np.ndarray:
+        pass
+    
+    @abstractmethod
+    def query_database(self, descriptor: np.ndarray, top_k: int) -> List[DatabaseMatch]:
+        pass
+    
+    @abstractmethod
+    def rank_candidates(self, candidates: List[TileCandidate]) -> List[TileCandidate]:
+        pass
+    
+    @abstractmethod
+    def initialize_database(self, satellite_tiles: List[SatelliteTile]) -> bool:
+        pass
+    
+    @abstractmethod
+    def retrieve_candidate_tiles_for_chunk(self, chunk_images: List[np.ndarray], top_k: int) -> List[TileCandidate]:
+        pass
+    
+    @abstractmethod
+    def compute_chunk_descriptor(self, chunk_images: List[np.ndarray]) -> np.ndarray:
+        pass
+```
+
+## Component Description
+
+### Responsibilities
+- AnyLoc (DINOv2 + VLAD) for coarse localization after tracking loss
+- "Kidnapped robot" recovery after sharp turns
+- Compute image descriptors robust to season/appearance changes
+- Query Faiss index of satellite tile descriptors
+- Return top-k candidate tile regions for progressive refinement
+- Initialize satellite descriptor database during system startup
+- **Chunk semantic matching (aggregate DINOv2 features)**
+- **Chunk descriptor computation for robust matching**
+
+### Scope
+- Global localization (not frame-to-frame)
+- Appearance-based place recognition
+- Handles domain gap (UAV vs satellite imagery)
+- Semantic feature extraction (DINOv2)
+- Efficient similarity search (Faiss)
+- **Chunk-level matching (more robust than single-image)**
+
+## API Methods
+
+### `retrieve_candidate_tiles(image: np.ndarray, top_k: int) -> List[TileCandidate]`
+
+**Description**: Retrieves top-k candidate satellite tiles for a UAV image.
+
+**Called By**:
+- F11 Failure Recovery Coordinator (after tracking loss)
+
+**Input**:
+```python
+image: np.ndarray  # UAV image
+top_k: int  # Number of candidates (typically 5)
+```
+
+**Output**:
+```python
+List[TileCandidate]:
+    tile_id: str
+    gps_center: GPSPoint
+    similarity_score: float
+    rank: int
+```
+
+**Processing Flow**:
+1. compute_location_descriptor(image) → descriptor
+2. query_database(descriptor, top_k) → database_matches
+3. Retrieve tile metadata for matches
+4. rank_candidates() → sorted by similarity
+5. Return top-k candidates
+
+**Error Conditions**:
+- Returns empty list: Database not initialized, query failed
+
+**Test Cases**:
+1. **UAV image over Ukraine**: Returns relevant tiles
+2. **Different season**: DINOv2 handles appearance change
+3. **Top-1 accuracy**: Correct tile in top-5 > 85%
+
+---
+
+### `compute_location_descriptor(image: np.ndarray) -> np.ndarray`
+
+**Description**: Computes global descriptor using DINOv2 + VLAD aggregation.
+
+**Called By**:
+- Internal (during retrieve_candidate_tiles)
+- System initialization (for satellite database)
+
+**Input**:
+```python
+image: np.ndarray  # UAV or satellite image
+```
+
+**Output**:
+```python
+np.ndarray: Descriptor vector (4096-dim or 8192-dim)
+```
+
+**Algorithm (AnyLoc)**:
+1. Extract DINOv2 features (dense feature map)
+2. Apply VLAD (Vector of Locally Aggregated Descriptors) aggregation
+3. L2-normalize descriptor
+4. Return compact global descriptor
+
+**Processing Details**:
+- Uses F16 Model Manager to get DINOv2 model
+- Dense features: extracts from multiple spatial locations
+- VLAD codebook: pre-trained cluster centers
+- Semantic features: invariant to texture/color changes
+
+**Performance**:
+- Inference time: ~150ms for DINOv2 + VLAD
+
+**Test Cases**:
+1. **Same location, different season**: Similar descriptors
+2. **Different locations**: Dissimilar descriptors
+3. **UAV vs satellite**: Domain-invariant features
+
+---
+
+### `query_database(descriptor: np.ndarray, top_k: int) -> List[DatabaseMatch]`
+
+**Description**: Queries Faiss index for most similar satellite tiles.
+
+**Called By**:
+- Internal (during retrieve_candidate_tiles)
+
+**Input**:
+```python
+descriptor: np.ndarray  # Query descriptor
+top_k: int
+```
+
+**Output**:
+```python
+List[DatabaseMatch]:
+    index: int  # Tile index in database
+    distance: float  # L2 distance
+    similarity_score: float  # Normalized score
+```
+
+**Processing Details**:
+- Uses H04 Faiss Index Manager
+- Index type: IVF (Inverted File) or HNSW for fast search
+- Distance metric: L2 (Euclidean)
+- Query time: ~10-50ms for 10,000+ tiles
+
+**Error Conditions**:
+- Returns empty list: Query failed
+
+**Test Cases**:
+1. **Query satellite database**: Returns top-5 matches
+2. **Large database (10,000 tiles)**: Fast retrieval (<50ms)
+
+---
+
+### `rank_candidates(candidates: List[TileCandidate]) -> List[TileCandidate]`
+
+**Description**: Re-ranks candidates based on additional heuristics.
+
+**Called By**:
+- Internal (during retrieve_candidate_tiles)
+
+**Input**:
+```python
+candidates: List[TileCandidate]  # Initial ranking by similarity
+```
+
+**Output**:
+```python
+List[TileCandidate]  # Re-ranked list
+```
+
+**Re-ranking Factors**:
+1. **Similarity score**: Primary factor
+2. **Spatial proximity**: Prefer tiles near dead-reckoning estimate
+3. **Previous trajectory**: Favor continuation of route
+4. **Geofence constraints**: Within operational area
+
+**Test Cases**:
+1. **Spatial re-ranking**: Closer tile promoted
+2. **Similar scores**: Spatial proximity breaks tie
+
+---
+
+### `initialize_database(satellite_tiles: List[SatelliteTile]) -> bool`
+
+**Description**: Initializes satellite descriptor database during system startup.
+
+**Called By**:
+- F02 Flight Manager (during system initialization)
+
+**Input**:
+```python
+List[SatelliteTile]:
+    tile_id: str
+    image: np.ndarray
+    gps_center: GPSPoint
+    bounds: TileBounds
+```
+
+**Output**:
+```python
+bool: True if database initialized successfully
+```
+
+**Processing Flow**:
+1. For each satellite tile:
+   - compute_location_descriptor(tile.image) → descriptor
+   - Store descriptor with tile metadata
+2. Build Faiss index using H04 Faiss Index Manager
+3. Persist index to disk for fast startup
+
+**Performance**:
+- Initialization time: ~10-30 minutes for 10,000 tiles (one-time cost)
+- Can be done offline and loaded at startup
+
+**Test Cases**:
+1. **Initialize with 1000 tiles**: Completes successfully
+2. **Load pre-built index**: Fast startup (<10s)
+
+---
+
+### `retrieve_candidate_tiles_for_chunk(chunk_images: List[np.ndarray], top_k: int) -> List[TileCandidate]`
+
+**Description**: Retrieves top-k candidate satellite tiles for a chunk using aggregate descriptor.
+
+**Called By**:
+- F11 Failure Recovery Coordinator (chunk semantic matching)
+- F12 Route Chunk Manager (chunk matching coordination)
+
+**Input**:
+```python
+chunk_images: List[np.ndarray]  # 5-20 images from chunk
+top_k: int  # Number of candidates (typically 5)
+```
+
+**Output**:
+```python
+List[TileCandidate]:
+    tile_id: str
+    gps_center: GPSPoint
+    similarity_score: float
+    rank: int
+```
+
+**Processing Flow**:
+1. compute_chunk_descriptor(chunk_images) → aggregate descriptor
+2. query_database(descriptor, top_k) → database_matches
+3. Retrieve tile metadata for matches
+4. rank_candidates() → sorted by similarity
+5. Return top-k candidates
+
+**Advantages over Single-Image Matching**:
+- Aggregate descriptor more robust to featureless terrain
+- Multiple images provide more context
+- Better handles plain fields where single-image matching fails
+
+**Test Cases**:
+1. **Chunk matching**: Returns relevant tiles
+2. **Featureless terrain**: Succeeds where single-image fails
+3. **Top-1 accuracy**: Correct tile in top-5 > 90% (better than single-image)
+
+---
+
+### `compute_chunk_descriptor(chunk_images: List[np.ndarray]) -> np.ndarray`
+
+**Description**: Computes aggregate DINOv2 descriptor from multiple chunk images.
+
+**Called By**:
+- Internal (during retrieve_candidate_tiles_for_chunk)
+- F12 Route Chunk Manager (chunk descriptor computation - delegates to F08)
+
+**Input**:
+```python
+chunk_images: List[np.ndarray]  # 5-20 images from chunk
+```
+
+**Output**:
+```python
+np.ndarray: Aggregated descriptor vector (4096-dim or 8192-dim)
+```
+
+**Algorithm**:
+1. For each image in chunk:
+   - compute_location_descriptor(image) → descriptor (DINOv2 + VLAD)
+2. Aggregate descriptors:
+   - **Mean aggregation**: Average all descriptors
+   - **VLAD aggregation**: Use VLAD codebook for aggregation
+   - **Max aggregation**: Element-wise maximum
+3. L2-normalize aggregated descriptor
+4. Return composite descriptor
+
+**Aggregation Strategy**:
+- **Mean**: Simple average (default)
+- **VLAD**: More sophisticated, preserves spatial information
+- **Max**: Emphasizes strongest features
+
+**Performance**:
+- Descriptor computation: ~150ms × N images (can be parallelized)
+- Aggregation: ~10ms
+
+**Test Cases**:
+1. **Compute descriptor**: Returns aggregated descriptor
+2. **Multiple images**: Descriptor aggregates correctly
+3. **Descriptor quality**: More robust than single-image descriptor
+
+## Integration Tests
+
+### Test 1: Place Recognition Flow
+1. Load UAV image from sharp turn
+2. retrieve_candidate_tiles(top_k=5)
+3. Verify correct tile in top-5
+4. Pass candidates to F11 Failure Recovery
+
+### Test 2: Season Invariance
+1. Satellite tiles from summer
+2. UAV images from autumn
+3. retrieve_candidate_tiles() → correct match despite appearance change
+
+### Test 3: Database Initialization
+1. Prepare 500 satellite tiles
+2. initialize_database(tiles)
+3. Verify Faiss index built
+4. Query with test image → returns matches
+
+### Test 4: Chunk Semantic Matching
+1. Build chunk with 10 images (plain field scenario)
+2. compute_chunk_descriptor() → aggregate descriptor
+3. retrieve_candidate_tiles_for_chunk() → returns candidates
+4. Verify correct tile in top-5 (where single-image matching failed)
+5. Verify chunk matching more robust than single-image
+
+## Non-Functional Requirements
+
+### Performance
+- **retrieve_candidate_tiles**: < 200ms total
+  - Descriptor computation: ~150ms
+  - Database query: ~50ms
+- **compute_location_descriptor**: ~150ms
+- **query_database**: ~10-50ms
+
+### Accuracy
+- **Recall@5**: > 85% (correct tile in top-5)
+- **Recall@1**: > 60% (correct tile is top-1)
+
+### Scalability
+- Support 10,000+ satellite tiles in database
+- Fast query even with large database
+
+## Dependencies
+
+### Internal Components
+- **F16 Model Manager**: For DINOv2 model
+- **H04 Faiss Index Manager**: For similarity search
+- **F04 Satellite Data Manager**: For tile metadata
+- **F12 Route Chunk Manager**: For chunk image retrieval
+
+### External Dependencies
+- **DINOv2**: Foundation vision model
+- **Faiss**: Similarity search library
+- **numpy**: Array operations
+
+## Data Models
+
+### TileCandidate
+```python
+class TileCandidate(BaseModel):
+    tile_id: str
+    gps_center: GPSPoint
+    bounds: TileBounds
+    similarity_score: float
+    rank: int
+    spatial_score: Optional[float]
+```
+
+### DatabaseMatch
+```python
+class DatabaseMatch(BaseModel):
+    index: int
+    tile_id: str
+    distance: float
+    similarity_score: float
+```
+
+### SatelliteTile
+```python
+class SatelliteTile(BaseModel):
+    tile_id: str
+    image: np.ndarray
+    gps_center: GPSPoint
+    bounds: TileBounds
+    descriptor: Optional[np.ndarray]
+```
+