mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-04-23 01:16:38 +00:00
add chunking
This commit is contained in:
@@ -0,0 +1,415 @@
|
||||
# Global Place Recognition
|
||||
|
||||
## Interface Definition
|
||||
|
||||
**Interface Name**: `IGlobalPlaceRecognition`
|
||||
|
||||
### Interface Methods
|
||||
|
||||
```python
|
||||
class IGlobalPlaceRecognition(ABC):
|
||||
@abstractmethod
|
||||
def retrieve_candidate_tiles(self, image: np.ndarray, top_k: int) -> List[TileCandidate]:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def compute_location_descriptor(self, image: np.ndarray) -> np.ndarray:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def query_database(self, descriptor: np.ndarray, top_k: int) -> List[DatabaseMatch]:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def rank_candidates(self, candidates: List[TileCandidate]) -> List[TileCandidate]:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def initialize_database(self, satellite_tiles: List[SatelliteTile]) -> bool:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def retrieve_candidate_tiles_for_chunk(self, chunk_images: List[np.ndarray], top_k: int) -> List[TileCandidate]:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def compute_chunk_descriptor(self, chunk_images: List[np.ndarray]) -> np.ndarray:
|
||||
pass
|
||||
```
|
||||
|
||||
## Component Description
|
||||
|
||||
### Responsibilities
|
||||
- AnyLoc (DINOv2 + VLAD) for coarse localization after tracking loss
|
||||
- "Kidnapped robot" recovery after sharp turns
|
||||
- Compute image descriptors robust to season/appearance changes
|
||||
- Query Faiss index of satellite tile descriptors
|
||||
- Return top-k candidate tile regions for progressive refinement
|
||||
- Initialize satellite descriptor database during system startup
|
||||
- **Chunk semantic matching (aggregate DINOv2 features)**
|
||||
- **Chunk descriptor computation for robust matching**
|
||||
|
||||
### Scope
|
||||
- Global localization (not frame-to-frame)
|
||||
- Appearance-based place recognition
|
||||
- Handles domain gap (UAV vs satellite imagery)
|
||||
- Semantic feature extraction (DINOv2)
|
||||
- Efficient similarity search (Faiss)
|
||||
- **Chunk-level matching (more robust than single-image)**
|
||||
|
||||
## API Methods
|
||||
|
||||
### `retrieve_candidate_tiles(image: np.ndarray, top_k: int) -> List[TileCandidate]`
|
||||
|
||||
**Description**: Retrieves top-k candidate satellite tiles for a UAV image.
|
||||
|
||||
**Called By**:
|
||||
- F11 Failure Recovery Coordinator (after tracking loss)
|
||||
|
||||
**Input**:
|
||||
```python
|
||||
image: np.ndarray # UAV image
|
||||
top_k: int # Number of candidates (typically 5)
|
||||
```
|
||||
|
||||
**Output**:
|
||||
```python
|
||||
List[TileCandidate]:
|
||||
tile_id: str
|
||||
gps_center: GPSPoint
|
||||
similarity_score: float
|
||||
rank: int
|
||||
```
|
||||
|
||||
**Processing Flow**:
|
||||
1. compute_location_descriptor(image) → descriptor
|
||||
2. query_database(descriptor, top_k) → database_matches
|
||||
3. Retrieve tile metadata for matches
|
||||
4. rank_candidates() → sorted by similarity
|
||||
5. Return top-k candidates
|
||||
|
||||
**Error Conditions**:
|
||||
- Returns empty list: Database not initialized, query failed
|
||||
|
||||
**Test Cases**:
|
||||
1. **UAV image over Ukraine**: Returns relevant tiles
|
||||
2. **Different season**: DINOv2 handles appearance change
|
||||
3. **Top-1 accuracy**: Correct tile in top-5 > 85%
|
||||
|
||||
---
|
||||
|
||||
### `compute_location_descriptor(image: np.ndarray) -> np.ndarray`
|
||||
|
||||
**Description**: Computes global descriptor using DINOv2 + VLAD aggregation.
|
||||
|
||||
**Called By**:
|
||||
- Internal (during retrieve_candidate_tiles)
|
||||
- System initialization (for satellite database)
|
||||
|
||||
**Input**:
|
||||
```python
|
||||
image: np.ndarray # UAV or satellite image
|
||||
```
|
||||
|
||||
**Output**:
|
||||
```python
|
||||
np.ndarray: Descriptor vector (4096-dim or 8192-dim)
|
||||
```
|
||||
|
||||
**Algorithm (AnyLoc)**:
|
||||
1. Extract DINOv2 features (dense feature map)
|
||||
2. Apply VLAD (Vector of Locally Aggregated Descriptors) aggregation
|
||||
3. L2-normalize descriptor
|
||||
4. Return compact global descriptor
|
||||
|
||||
**Processing Details**:
|
||||
- Uses F16 Model Manager to get DINOv2 model
|
||||
- Dense features: extracts from multiple spatial locations
|
||||
- VLAD codebook: pre-trained cluster centers
|
||||
- Semantic features: invariant to texture/color changes
|
||||
|
||||
**Performance**:
|
||||
- Inference time: ~150ms for DINOv2 + VLAD
|
||||
|
||||
**Test Cases**:
|
||||
1. **Same location, different season**: Similar descriptors
|
||||
2. **Different locations**: Dissimilar descriptors
|
||||
3. **UAV vs satellite**: Domain-invariant features
|
||||
|
||||
---
|
||||
|
||||
### `query_database(descriptor: np.ndarray, top_k: int) -> List[DatabaseMatch]`
|
||||
|
||||
**Description**: Queries Faiss index for most similar satellite tiles.
|
||||
|
||||
**Called By**:
|
||||
- Internal (during retrieve_candidate_tiles)
|
||||
|
||||
**Input**:
|
||||
```python
|
||||
descriptor: np.ndarray # Query descriptor
|
||||
top_k: int
|
||||
```
|
||||
|
||||
**Output**:
|
||||
```python
|
||||
List[DatabaseMatch]:
|
||||
index: int # Tile index in database
|
||||
distance: float # L2 distance
|
||||
similarity_score: float # Normalized score
|
||||
```
|
||||
|
||||
**Processing Details**:
|
||||
- Uses H04 Faiss Index Manager
|
||||
- Index type: IVF (Inverted File) or HNSW for fast search
|
||||
- Distance metric: L2 (Euclidean)
|
||||
- Query time: ~10-50ms for 10,000+ tiles
|
||||
|
||||
**Error Conditions**:
|
||||
- Returns empty list: Query failed
|
||||
|
||||
**Test Cases**:
|
||||
1. **Query satellite database**: Returns top-5 matches
|
||||
2. **Large database (10,000 tiles)**: Fast retrieval (<50ms)
|
||||
|
||||
---
|
||||
|
||||
### `rank_candidates(candidates: List[TileCandidate]) -> List[TileCandidate]`
|
||||
|
||||
**Description**: Re-ranks candidates based on additional heuristics.
|
||||
|
||||
**Called By**:
|
||||
- Internal (during retrieve_candidate_tiles)
|
||||
|
||||
**Input**:
|
||||
```python
|
||||
candidates: List[TileCandidate] # Initial ranking by similarity
|
||||
```
|
||||
|
||||
**Output**:
|
||||
```python
|
||||
List[TileCandidate] # Re-ranked list
|
||||
```
|
||||
|
||||
**Re-ranking Factors**:
|
||||
1. **Similarity score**: Primary factor
|
||||
2. **Spatial proximity**: Prefer tiles near dead-reckoning estimate
|
||||
3. **Previous trajectory**: Favor continuation of route
|
||||
4. **Geofence constraints**: Within operational area
|
||||
|
||||
**Test Cases**:
|
||||
1. **Spatial re-ranking**: Closer tile promoted
|
||||
2. **Similar scores**: Spatial proximity breaks tie
|
||||
|
||||
---
|
||||
|
||||
### `initialize_database(satellite_tiles: List[SatelliteTile]) -> bool`
|
||||
|
||||
**Description**: Initializes satellite descriptor database during system startup.
|
||||
|
||||
**Called By**:
|
||||
- F02 Flight Manager (during system initialization)
|
||||
|
||||
**Input**:
|
||||
```python
|
||||
List[SatelliteTile]:
|
||||
tile_id: str
|
||||
image: np.ndarray
|
||||
gps_center: GPSPoint
|
||||
bounds: TileBounds
|
||||
```
|
||||
|
||||
**Output**:
|
||||
```python
|
||||
bool: True if database initialized successfully
|
||||
```
|
||||
|
||||
**Processing Flow**:
|
||||
1. For each satellite tile:
|
||||
- compute_location_descriptor(tile.image) → descriptor
|
||||
- Store descriptor with tile metadata
|
||||
2. Build Faiss index using H04 Faiss Index Manager
|
||||
3. Persist index to disk for fast startup
|
||||
|
||||
**Performance**:
|
||||
- Initialization time: ~10-30 minutes for 10,000 tiles (one-time cost)
|
||||
- Can be done offline and loaded at startup
|
||||
|
||||
**Test Cases**:
|
||||
1. **Initialize with 1000 tiles**: Completes successfully
|
||||
2. **Load pre-built index**: Fast startup (<10s)
|
||||
|
||||
---
|
||||
|
||||
### `retrieve_candidate_tiles_for_chunk(chunk_images: List[np.ndarray], top_k: int) -> List[TileCandidate]`
|
||||
|
||||
**Description**: Retrieves top-k candidate satellite tiles for a chunk using aggregate descriptor.
|
||||
|
||||
**Called By**:
|
||||
- F11 Failure Recovery Coordinator (chunk semantic matching)
|
||||
- F12 Route Chunk Manager (chunk matching coordination)
|
||||
|
||||
**Input**:
|
||||
```python
|
||||
chunk_images: List[np.ndarray] # 5-20 images from chunk
|
||||
top_k: int # Number of candidates (typically 5)
|
||||
```
|
||||
|
||||
**Output**:
|
||||
```python
|
||||
List[TileCandidate]:
|
||||
tile_id: str
|
||||
gps_center: GPSPoint
|
||||
similarity_score: float
|
||||
rank: int
|
||||
```
|
||||
|
||||
**Processing Flow**:
|
||||
1. compute_chunk_descriptor(chunk_images) → aggregate descriptor
|
||||
2. query_database(descriptor, top_k) → database_matches
|
||||
3. Retrieve tile metadata for matches
|
||||
4. rank_candidates() → sorted by similarity
|
||||
5. Return top-k candidates
|
||||
|
||||
**Advantages over Single-Image Matching**:
|
||||
- Aggregate descriptor more robust to featureless terrain
|
||||
- Multiple images provide more context
|
||||
- Better handles plain fields where single-image matching fails
|
||||
|
||||
**Test Cases**:
|
||||
1. **Chunk matching**: Returns relevant tiles
|
||||
2. **Featureless terrain**: Succeeds where single-image fails
|
||||
3. **Top-1 accuracy**: Correct tile in top-5 > 90% (better than single-image)
|
||||
|
||||
---
|
||||
|
||||
### `compute_chunk_descriptor(chunk_images: List[np.ndarray]) -> np.ndarray`
|
||||
|
||||
**Description**: Computes aggregate DINOv2 descriptor from multiple chunk images.
|
||||
|
||||
**Called By**:
|
||||
- Internal (during retrieve_candidate_tiles_for_chunk)
|
||||
- F12 Route Chunk Manager (chunk descriptor computation - delegates to F08)
|
||||
|
||||
**Input**:
|
||||
```python
|
||||
chunk_images: List[np.ndarray] # 5-20 images from chunk
|
||||
```
|
||||
|
||||
**Output**:
|
||||
```python
|
||||
np.ndarray: Aggregated descriptor vector (4096-dim or 8192-dim)
|
||||
```
|
||||
|
||||
**Algorithm**:
|
||||
1. For each image in chunk:
|
||||
- compute_location_descriptor(image) → descriptor (DINOv2 + VLAD)
|
||||
2. Aggregate descriptors:
|
||||
- **Mean aggregation**: Average all descriptors
|
||||
- **VLAD aggregation**: Use VLAD codebook for aggregation
|
||||
- **Max aggregation**: Element-wise maximum
|
||||
3. L2-normalize aggregated descriptor
|
||||
4. Return composite descriptor
|
||||
|
||||
**Aggregation Strategy**:
|
||||
- **Mean**: Simple average (default)
|
||||
- **VLAD**: More sophisticated, preserves spatial information
|
||||
- **Max**: Emphasizes strongest features
|
||||
|
||||
**Performance**:
|
||||
- Descriptor computation: ~150ms × N images (can be parallelized)
|
||||
- Aggregation: ~10ms
|
||||
|
||||
**Test Cases**:
|
||||
1. **Compute descriptor**: Returns aggregated descriptor
|
||||
2. **Multiple images**: Descriptor aggregates correctly
|
||||
3. **Descriptor quality**: More robust than single-image descriptor
|
||||
|
||||
## Integration Tests
|
||||
|
||||
### Test 1: Place Recognition Flow
|
||||
1. Load UAV image from sharp turn
|
||||
2. retrieve_candidate_tiles(top_k=5)
|
||||
3. Verify correct tile in top-5
|
||||
4. Pass candidates to F11 Failure Recovery
|
||||
|
||||
### Test 2: Season Invariance
|
||||
1. Satellite tiles from summer
|
||||
2. UAV images from autumn
|
||||
3. retrieve_candidate_tiles() → correct match despite appearance change
|
||||
|
||||
### Test 3: Database Initialization
|
||||
1. Prepare 500 satellite tiles
|
||||
2. initialize_database(tiles)
|
||||
3. Verify Faiss index built
|
||||
4. Query with test image → returns matches
|
||||
|
||||
### Test 4: Chunk Semantic Matching
|
||||
1. Build chunk with 10 images (plain field scenario)
|
||||
2. compute_chunk_descriptor() → aggregate descriptor
|
||||
3. retrieve_candidate_tiles_for_chunk() → returns candidates
|
||||
4. Verify correct tile in top-5 (where single-image matching failed)
|
||||
5. Verify chunk matching more robust than single-image
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
### Performance
|
||||
- **retrieve_candidate_tiles**: < 200ms total
|
||||
- Descriptor computation: ~150ms
|
||||
- Database query: ~50ms
|
||||
- **compute_location_descriptor**: ~150ms
|
||||
- **query_database**: ~10-50ms
|
||||
|
||||
### Accuracy
|
||||
- **Recall@5**: > 85% (correct tile in top-5)
|
||||
- **Recall@1**: > 60% (correct tile is top-1)
|
||||
|
||||
### Scalability
|
||||
- Support 10,000+ satellite tiles in database
|
||||
- Fast query even with large database
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Internal Components
|
||||
- **F16 Model Manager**: For DINOv2 model
|
||||
- **H04 Faiss Index Manager**: For similarity search
|
||||
- **F04 Satellite Data Manager**: For tile metadata
|
||||
- **F12 Route Chunk Manager**: For chunk image retrieval
|
||||
|
||||
### External Dependencies
|
||||
- **DINOv2**: Foundation vision model
|
||||
- **Faiss**: Similarity search library
|
||||
- **numpy**: Array operations
|
||||
|
||||
## Data Models
|
||||
|
||||
### TileCandidate
|
||||
```python
|
||||
class TileCandidate(BaseModel):
|
||||
tile_id: str
|
||||
gps_center: GPSPoint
|
||||
bounds: TileBounds
|
||||
similarity_score: float
|
||||
rank: int
|
||||
spatial_score: Optional[float]
|
||||
```
|
||||
|
||||
### DatabaseMatch
|
||||
```python
|
||||
class DatabaseMatch(BaseModel):
|
||||
index: int
|
||||
tile_id: str
|
||||
distance: float
|
||||
similarity_score: float
|
||||
```
|
||||
|
||||
### SatelliteTile
|
||||
```python
|
||||
class SatelliteTile(BaseModel):
|
||||
tile_id: str
|
||||
image: np.ndarray
|
||||
gps_center: GPSPoint
|
||||
bounds: TileBounds
|
||||
descriptor: Optional[np.ndarray]
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user