Files
detections/_docs/02_document/modules/inference.md
T

108 lines
4.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Module: inference
## Purpose
Core inference orchestrator — manages the AI engine lifecycle, preprocesses media (images and video), runs batched inference, postprocesses detections, and applies validation filters (overlap removal, size filtering, tile deduplication, video tracking).
## Public Interface
### Class: Inference
#### Fields
| Field | Type | Access | Description |
|-------|------|--------|-------------|
| `loader_client` | object | internal | LoaderHttpClient instance |
| `engine` | InferenceEngine | internal | Active engine (OnnxEngine or TensorRTEngine), None if unavailable |
| `ai_availability_status` | AIAvailabilityStatus | public | Current AI readiness status |
| `stop_signal` | bool | internal | Flag to abort video processing |
| `model_width` | int | internal | Model input width in pixels |
| `model_height` | int | internal | Model input height in pixels |
| `detection_counts` | dict[str, int] | internal | Per-media detection count |
| `is_building_engine` | bool | internal | True during async TensorRT conversion |
#### Methods
| Method | Signature | Access | Description |
|--------|-----------|--------|-------------|
| `__init__` | `(loader_client)` | public | Initializes state, calls `init_ai()` |
| `run_detect` | `(dict config_dict, annotation_callback, status_callback=None)` | cpdef | Main entry: parses config, separates images/videos, processes each |
| `detect_single_image` | `(bytes image_bytes, dict config_dict) -> list` | cpdef | Single-image detection from raw bytes, returns list[Detection] |
| `stop` | `()` | cpdef | Sets stop_signal to True |
| `init_ai` | `()` | cdef | Engine initialization: tries TensorRT engine file → falls back to ONNX → background TensorRT conversion |
| `preprocess` | `(frames) -> ndarray` | cdef | OpenCV blobFromImage: resize, normalize to 0..1, swap RGB, stack batch |
| `postprocess` | `(output, ai_config) -> list[list[Detection]]` | cdef | Parses engine output to Detection objects, applies confidence threshold and overlap removal |
## Internal Logic
### Engine Initialization (`init_ai`)
1. If `_converted_model_bytes` exists → load TensorRT from those bytes
2. If GPU available → try downloading pre-built TensorRT engine from loader
3. If download fails → download ONNX model, start background thread for ONNX→TensorRT conversion
4. If no GPU → load OnnxEngine from ONNX model bytes
### Preprocessing
- `cv2.dnn.blobFromImage`: scale 1/255, resize to model dims, BGR→RGB, no crop
- Stack multiple frames via `np.vstack` for batched inference
### Postprocessing
- Engine output format: `[batch][detection_index][x1, y1, x2, y2, confidence, class_id]`
- Coordinates normalized to 0..1 by dividing by model width/height
- Converted to center-format (cx, cy, w, h) Detection objects
- Filtered by `probability_threshold`
- Overlapping detections removed via `remove_overlapping_detections` (greedy, keeps higher confidence)
### Image Processing
- Small images (≤1.5× model size): processed as single frame
- Large images: split into tiles based on ground sampling distance. Tile size = `METERS_IN_TILE / GSD` pixels. Tiles overlap by configurable percentage.
- Tile deduplication: absolute-coordinate comparison across adjacent tiles using `Detection.__eq__`
- Size filtering: detections whose physical size (meters) exceeds `AnnotationClass.max_object_size_meters` are removed. Physical size computed from GSD × pixel dimensions.
### Video Processing
- Frame sampling: every Nth frame (`frame_period_recognition`)
- Batch accumulation up to engine batch size
- Annotation validity: must differ from previous annotation by either:
- Time gap ≥ `frame_recognition_seconds`
- More detections than previous
- Any detection moved beyond `tracking_distance_confidence` threshold
- Any detection confidence increased beyond `tracking_probability_increase`
- Valid frames get JPEG-encoded image attached
### Ground Sampling Distance (GSD)
`GSD = sensor_width * altitude / (focal_length * image_width)` — meters per pixel, used for physical size filtering of aerial detections.
## Dependencies
- **External**: `cv2`, `numpy`, `pynvml`, `mimetypes`, `pathlib`, `threading`
- **Internal**: `constants_inf`, `ai_availability_status`, `annotation`, `ai_config`, `tensorrt_engine` (conditional), `onnx_engine` (conditional), `inference_engine` (type)
## Consumers
- `main` — lazy-initializes Inference, calls `run_detect`, `detect_single_image`, reads `ai_availability_status`
## Data Models
Uses `Detection`, `Annotation` (from annotation), `AIRecognitionConfig` (from ai_config), `AIAvailabilityStatus` (from ai_availability_status).
## Configuration
All runtime config comes via `AIRecognitionConfig` dict. Engine selection is automatic based on GPU availability (checked at module-level via pynvml).
## External Integrations
- **Loader service** (via loader_client): model download/upload
## Security
None.
## Tests
None found.