mirror of
https://github.com/azaion/detections.git
synced 2026-04-22 22:16:31 +00:00
108 lines
4.9 KiB
Markdown
108 lines
4.9 KiB
Markdown
# Module: inference
|
||
|
||
## Purpose
|
||
|
||
Core inference orchestrator — manages the AI engine lifecycle, preprocesses media (images and video), runs batched inference, postprocesses detections, and applies validation filters (overlap removal, size filtering, tile deduplication, video tracking).
|
||
|
||
## Public Interface
|
||
|
||
### Class: Inference
|
||
|
||
#### Fields
|
||
|
||
| Field | Type | Access | Description |
|
||
|-------|------|--------|-------------|
|
||
| `loader_client` | object | internal | LoaderHttpClient instance |
|
||
| `engine` | InferenceEngine | internal | Active engine (OnnxEngine or TensorRTEngine), None if unavailable |
|
||
| `ai_availability_status` | AIAvailabilityStatus | public | Current AI readiness status |
|
||
| `stop_signal` | bool | internal | Flag to abort video processing |
|
||
| `model_width` | int | internal | Model input width in pixels |
|
||
| `model_height` | int | internal | Model input height in pixels |
|
||
| `detection_counts` | dict[str, int] | internal | Per-media detection count |
|
||
| `is_building_engine` | bool | internal | True during async TensorRT conversion |
|
||
|
||
#### Methods
|
||
|
||
| Method | Signature | Access | Description |
|
||
|--------|-----------|--------|-------------|
|
||
| `__init__` | `(loader_client)` | public | Initializes state, calls `init_ai()` |
|
||
| `run_detect` | `(dict config_dict, annotation_callback, status_callback=None)` | cpdef | Main entry: parses config, separates images/videos, processes each |
|
||
| `detect_single_image` | `(bytes image_bytes, dict config_dict) -> list` | cpdef | Single-image detection from raw bytes, returns list[Detection] |
|
||
| `stop` | `()` | cpdef | Sets stop_signal to True |
|
||
| `init_ai` | `()` | cdef | Engine initialization: tries TensorRT engine file → falls back to ONNX → background TensorRT conversion |
|
||
| `preprocess` | `(frames) -> ndarray` | cdef | OpenCV blobFromImage: resize, normalize to 0..1, swap RGB, stack batch |
|
||
| `postprocess` | `(output, ai_config) -> list[list[Detection]]` | cdef | Parses engine output to Detection objects, applies confidence threshold and overlap removal |
|
||
|
||
## Internal Logic
|
||
|
||
### Engine Initialization (`init_ai`)
|
||
|
||
1. If `_converted_model_bytes` exists → load TensorRT from those bytes
|
||
2. If GPU available → try downloading pre-built TensorRT engine from loader
|
||
3. If download fails → download ONNX model, start background thread for ONNX→TensorRT conversion
|
||
4. If no GPU → load OnnxEngine from ONNX model bytes
|
||
|
||
### Preprocessing
|
||
|
||
- `cv2.dnn.blobFromImage`: scale 1/255, resize to model dims, BGR→RGB, no crop
|
||
- Stack multiple frames via `np.vstack` for batched inference
|
||
|
||
### Postprocessing
|
||
|
||
- Engine output format: `[batch][detection_index][x1, y1, x2, y2, confidence, class_id]`
|
||
- Coordinates normalized to 0..1 by dividing by model width/height
|
||
- Converted to center-format (cx, cy, w, h) Detection objects
|
||
- Filtered by `probability_threshold`
|
||
- Overlapping detections removed via `remove_overlapping_detections` (greedy, keeps higher confidence)
|
||
|
||
### Image Processing
|
||
|
||
- Small images (≤1.5× model size): processed as single frame
|
||
- Large images: split into tiles based on ground sampling distance. Tile size = `METERS_IN_TILE / GSD` pixels. Tiles overlap by configurable percentage.
|
||
- Tile deduplication: absolute-coordinate comparison across adjacent tiles using `Detection.__eq__`
|
||
- Size filtering: detections whose physical size (meters) exceeds `AnnotationClass.max_object_size_meters` are removed. Physical size computed from GSD × pixel dimensions.
|
||
|
||
### Video Processing
|
||
|
||
- Frame sampling: every Nth frame (`frame_period_recognition`)
|
||
- Batch accumulation up to engine batch size
|
||
- Annotation validity: must differ from previous annotation by either:
|
||
- Time gap ≥ `frame_recognition_seconds`
|
||
- More detections than previous
|
||
- Any detection moved beyond `tracking_distance_confidence` threshold
|
||
- Any detection confidence increased beyond `tracking_probability_increase`
|
||
- Valid frames get JPEG-encoded image attached
|
||
|
||
### Ground Sampling Distance (GSD)
|
||
|
||
`GSD = sensor_width * altitude / (focal_length * image_width)` — meters per pixel, used for physical size filtering of aerial detections.
|
||
|
||
## Dependencies
|
||
|
||
- **External**: `cv2`, `numpy`, `pynvml`, `mimetypes`, `pathlib`, `threading`
|
||
- **Internal**: `constants_inf`, `ai_availability_status`, `annotation`, `ai_config`, `tensorrt_engine` (conditional), `onnx_engine` (conditional), `inference_engine` (type)
|
||
|
||
## Consumers
|
||
|
||
- `main` — lazy-initializes Inference, calls `run_detect`, `detect_single_image`, reads `ai_availability_status`
|
||
|
||
## Data Models
|
||
|
||
Uses `Detection`, `Annotation` (from annotation), `AIRecognitionConfig` (from ai_config), `AIAvailabilityStatus` (from ai_availability_status).
|
||
|
||
## Configuration
|
||
|
||
All runtime config comes via `AIRecognitionConfig` dict. Engine selection is automatic based on GPU availability (checked at module-level via pynvml).
|
||
|
||
## External Integrations
|
||
|
||
- **Loader service** (via loader_client): model download/upload
|
||
|
||
## Security
|
||
|
||
None.
|
||
|
||
## Tests
|
||
|
||
None found.
|