add chunking

This commit is contained in:
Oleksandr Bezdieniezhnykh
2025-11-27 03:43:19 +02:00
parent 4f8c18a066
commit 2037870f67
43 changed files with 7041 additions and 4135 deletions
@@ -0,0 +1,415 @@
# Global Place Recognition
## Interface Definition
**Interface Name**: `IGlobalPlaceRecognition`
### Interface Methods
```python
class IGlobalPlaceRecognition(ABC):
@abstractmethod
def retrieve_candidate_tiles(self, image: np.ndarray, top_k: int) -> List[TileCandidate]:
pass
@abstractmethod
def compute_location_descriptor(self, image: np.ndarray) -> np.ndarray:
pass
@abstractmethod
def query_database(self, descriptor: np.ndarray, top_k: int) -> List[DatabaseMatch]:
pass
@abstractmethod
def rank_candidates(self, candidates: List[TileCandidate]) -> List[TileCandidate]:
pass
@abstractmethod
def initialize_database(self, satellite_tiles: List[SatelliteTile]) -> bool:
pass
@abstractmethod
def retrieve_candidate_tiles_for_chunk(self, chunk_images: List[np.ndarray], top_k: int) -> List[TileCandidate]:
pass
@abstractmethod
def compute_chunk_descriptor(self, chunk_images: List[np.ndarray]) -> np.ndarray:
pass
```
## Component Description
### Responsibilities
- AnyLoc (DINOv2 + VLAD) for coarse localization after tracking loss
- "Kidnapped robot" recovery after sharp turns
- Compute image descriptors robust to season/appearance changes
- Query Faiss index of satellite tile descriptors
- Return top-k candidate tile regions for progressive refinement
- Initialize satellite descriptor database during system startup
- **Chunk semantic matching (aggregate DINOv2 features)**
- **Chunk descriptor computation for robust matching**
### Scope
- Global localization (not frame-to-frame)
- Appearance-based place recognition
- Handles domain gap (UAV vs satellite imagery)
- Semantic feature extraction (DINOv2)
- Efficient similarity search (Faiss)
- **Chunk-level matching (more robust than single-image)**
## API Methods
### `retrieve_candidate_tiles(image: np.ndarray, top_k: int) -> List[TileCandidate]`
**Description**: Retrieves top-k candidate satellite tiles for a UAV image.
**Called By**:
- F11 Failure Recovery Coordinator (after tracking loss)
**Input**:
```python
image: np.ndarray # UAV image
top_k: int # Number of candidates (typically 5)
```
**Output**:
```python
List[TileCandidate]:
tile_id: str
gps_center: GPSPoint
similarity_score: float
rank: int
```
**Processing Flow**:
1. compute_location_descriptor(image) → descriptor
2. query_database(descriptor, top_k) → database_matches
3. Retrieve tile metadata for matches
4. rank_candidates() → sorted by similarity
5. Return top-k candidates
**Error Conditions**:
- Returns empty list: Database not initialized, query failed
**Test Cases**:
1. **UAV image over Ukraine**: Returns relevant tiles
2. **Different season**: DINOv2 handles appearance change
3. **Top-1 accuracy**: Correct tile in top-5 > 85%
---
### `compute_location_descriptor(image: np.ndarray) -> np.ndarray`
**Description**: Computes global descriptor using DINOv2 + VLAD aggregation.
**Called By**:
- Internal (during retrieve_candidate_tiles)
- System initialization (for satellite database)
**Input**:
```python
image: np.ndarray # UAV or satellite image
```
**Output**:
```python
np.ndarray: Descriptor vector (4096-dim or 8192-dim)
```
**Algorithm (AnyLoc)**:
1. Extract DINOv2 features (dense feature map)
2. Apply VLAD (Vector of Locally Aggregated Descriptors) aggregation
3. L2-normalize descriptor
4. Return compact global descriptor
**Processing Details**:
- Uses F16 Model Manager to get DINOv2 model
- Dense features: extracts from multiple spatial locations
- VLAD codebook: pre-trained cluster centers
- Semantic features: invariant to texture/color changes
**Performance**:
- Inference time: ~150ms for DINOv2 + VLAD
**Test Cases**:
1. **Same location, different season**: Similar descriptors
2. **Different locations**: Dissimilar descriptors
3. **UAV vs satellite**: Domain-invariant features
---
### `query_database(descriptor: np.ndarray, top_k: int) -> List[DatabaseMatch]`
**Description**: Queries Faiss index for most similar satellite tiles.
**Called By**:
- Internal (during retrieve_candidate_tiles)
**Input**:
```python
descriptor: np.ndarray # Query descriptor
top_k: int
```
**Output**:
```python
List[DatabaseMatch]:
index: int # Tile index in database
distance: float # L2 distance
similarity_score: float # Normalized score
```
**Processing Details**:
- Uses H04 Faiss Index Manager
- Index type: IVF (Inverted File) or HNSW for fast search
- Distance metric: L2 (Euclidean)
- Query time: ~10-50ms for 10,000+ tiles
**Error Conditions**:
- Returns empty list: Query failed
**Test Cases**:
1. **Query satellite database**: Returns top-5 matches
2. **Large database (10,000 tiles)**: Fast retrieval (<50ms)
---
### `rank_candidates(candidates: List[TileCandidate]) -> List[TileCandidate]`
**Description**: Re-ranks candidates based on additional heuristics.
**Called By**:
- Internal (during retrieve_candidate_tiles)
**Input**:
```python
candidates: List[TileCandidate] # Initial ranking by similarity
```
**Output**:
```python
List[TileCandidate] # Re-ranked list
```
**Re-ranking Factors**:
1. **Similarity score**: Primary factor
2. **Spatial proximity**: Prefer tiles near dead-reckoning estimate
3. **Previous trajectory**: Favor continuation of route
4. **Geofence constraints**: Within operational area
**Test Cases**:
1. **Spatial re-ranking**: Closer tile promoted
2. **Similar scores**: Spatial proximity breaks tie
---
### `initialize_database(satellite_tiles: List[SatelliteTile]) -> bool`
**Description**: Initializes satellite descriptor database during system startup.
**Called By**:
- F02 Flight Manager (during system initialization)
**Input**:
```python
List[SatelliteTile]:
tile_id: str
image: np.ndarray
gps_center: GPSPoint
bounds: TileBounds
```
**Output**:
```python
bool: True if database initialized successfully
```
**Processing Flow**:
1. For each satellite tile:
- compute_location_descriptor(tile.image) → descriptor
- Store descriptor with tile metadata
2. Build Faiss index using H04 Faiss Index Manager
3. Persist index to disk for fast startup
**Performance**:
- Initialization time: ~10-30 minutes for 10,000 tiles (one-time cost)
- Can be done offline and loaded at startup
**Test Cases**:
1. **Initialize with 1000 tiles**: Completes successfully
2. **Load pre-built index**: Fast startup (<10s)
---
### `retrieve_candidate_tiles_for_chunk(chunk_images: List[np.ndarray], top_k: int) -> List[TileCandidate]`
**Description**: Retrieves top-k candidate satellite tiles for a chunk using aggregate descriptor.
**Called By**:
- F11 Failure Recovery Coordinator (chunk semantic matching)
- F12 Route Chunk Manager (chunk matching coordination)
**Input**:
```python
chunk_images: List[np.ndarray] # 5-20 images from chunk
top_k: int # Number of candidates (typically 5)
```
**Output**:
```python
List[TileCandidate]:
tile_id: str
gps_center: GPSPoint
similarity_score: float
rank: int
```
**Processing Flow**:
1. compute_chunk_descriptor(chunk_images) → aggregate descriptor
2. query_database(descriptor, top_k) → database_matches
3. Retrieve tile metadata for matches
4. rank_candidates() → sorted by similarity
5. Return top-k candidates
**Advantages over Single-Image Matching**:
- Aggregate descriptor more robust to featureless terrain
- Multiple images provide more context
- Better handles plain fields where single-image matching fails
**Test Cases**:
1. **Chunk matching**: Returns relevant tiles
2. **Featureless terrain**: Succeeds where single-image fails
3. **Top-1 accuracy**: Correct tile in top-5 > 90% (better than single-image)
---
### `compute_chunk_descriptor(chunk_images: List[np.ndarray]) -> np.ndarray`
**Description**: Computes aggregate DINOv2 descriptor from multiple chunk images.
**Called By**:
- Internal (during retrieve_candidate_tiles_for_chunk)
- F12 Route Chunk Manager (chunk descriptor computation - delegates to F08)
**Input**:
```python
chunk_images: List[np.ndarray] # 5-20 images from chunk
```
**Output**:
```python
np.ndarray: Aggregated descriptor vector (4096-dim or 8192-dim)
```
**Algorithm**:
1. For each image in chunk:
- compute_location_descriptor(image) → descriptor (DINOv2 + VLAD)
2. Aggregate descriptors:
- **Mean aggregation**: Average all descriptors
- **VLAD aggregation**: Use VLAD codebook for aggregation
- **Max aggregation**: Element-wise maximum
3. L2-normalize aggregated descriptor
4. Return composite descriptor
**Aggregation Strategy**:
- **Mean**: Simple average (default)
- **VLAD**: More sophisticated, preserves spatial information
- **Max**: Emphasizes strongest features
**Performance**:
- Descriptor computation: ~150ms × N images (can be parallelized)
- Aggregation: ~10ms
**Test Cases**:
1. **Compute descriptor**: Returns aggregated descriptor
2. **Multiple images**: Descriptor aggregates correctly
3. **Descriptor quality**: More robust than single-image descriptor
## Integration Tests
### Test 1: Place Recognition Flow
1. Load UAV image from sharp turn
2. retrieve_candidate_tiles(top_k=5)
3. Verify correct tile in top-5
4. Pass candidates to F11 Failure Recovery
### Test 2: Season Invariance
1. Satellite tiles from summer
2. UAV images from autumn
3. retrieve_candidate_tiles() → correct match despite appearance change
### Test 3: Database Initialization
1. Prepare 500 satellite tiles
2. initialize_database(tiles)
3. Verify Faiss index built
4. Query with test image → returns matches
### Test 4: Chunk Semantic Matching
1. Build chunk with 10 images (plain field scenario)
2. compute_chunk_descriptor() → aggregate descriptor
3. retrieve_candidate_tiles_for_chunk() → returns candidates
4. Verify correct tile in top-5 (where single-image matching failed)
5. Verify chunk matching more robust than single-image
## Non-Functional Requirements
### Performance
- **retrieve_candidate_tiles**: < 200ms total
- Descriptor computation: ~150ms
- Database query: ~50ms
- **compute_location_descriptor**: ~150ms
- **query_database**: ~10-50ms
### Accuracy
- **Recall@5**: > 85% (correct tile in top-5)
- **Recall@1**: > 60% (correct tile is top-1)
### Scalability
- Support 10,000+ satellite tiles in database
- Fast query even with large database
## Dependencies
### Internal Components
- **F16 Model Manager**: For DINOv2 model
- **H04 Faiss Index Manager**: For similarity search
- **F04 Satellite Data Manager**: For tile metadata
- **F12 Route Chunk Manager**: For chunk image retrieval
### External Dependencies
- **DINOv2**: Foundation vision model
- **Faiss**: Similarity search library
- **numpy**: Array operations
## Data Models
### TileCandidate
```python
class TileCandidate(BaseModel):
tile_id: str
gps_center: GPSPoint
bounds: TileBounds
similarity_score: float
rank: int
spatial_score: Optional[float]
```
### DatabaseMatch
```python
class DatabaseMatch(BaseModel):
index: int
tile_id: str
distance: float
similarity_score: float
```
### SatelliteTile
```python
class SatelliteTile(BaseModel):
tile_id: str
image: np.ndarray
gps_center: GPSPoint
bounds: TileBounds
descriptor: Optional[np.ndarray]
```