mirror of
https://github.com/azaion/detections.git
synced 2026-04-22 22:16:31 +00:00
4.9 KiB
4.9 KiB
Module: inference
Purpose
Core inference orchestrator — manages the AI engine lifecycle, preprocesses media (images and video), runs batched inference, postprocesses detections, and applies validation filters (overlap removal, size filtering, tile deduplication, video tracking).
Public Interface
Class: Inference
Fields
| Field | Type | Access | Description |
|---|---|---|---|
loader_client |
object | internal | LoaderHttpClient instance |
engine |
InferenceEngine | internal | Active engine (OnnxEngine or TensorRTEngine), None if unavailable |
ai_availability_status |
AIAvailabilityStatus | public | Current AI readiness status |
stop_signal |
bool | internal | Flag to abort video processing |
model_width |
int | internal | Model input width in pixels |
model_height |
int | internal | Model input height in pixels |
detection_counts |
dict[str, int] | internal | Per-media detection count |
is_building_engine |
bool | internal | True during async TensorRT conversion |
Methods
| Method | Signature | Access | Description |
|---|---|---|---|
__init__ |
(loader_client) |
public | Initializes state, calls init_ai() |
run_detect |
(dict config_dict, annotation_callback, status_callback=None) |
cpdef | Main entry: parses config, separates images/videos, processes each |
detect_single_image |
(bytes image_bytes, dict config_dict) -> list |
cpdef | Single-image detection from raw bytes, returns list[Detection] |
stop |
() |
cpdef | Sets stop_signal to True |
init_ai |
() |
cdef | Engine initialization: tries TensorRT engine file → falls back to ONNX → background TensorRT conversion |
preprocess |
(frames) -> ndarray |
cdef | OpenCV blobFromImage: resize, normalize to 0..1, swap RGB, stack batch |
postprocess |
(output, ai_config) -> list[list[Detection]] |
cdef | Parses engine output to Detection objects, applies confidence threshold and overlap removal |
Internal Logic
Engine Initialization (init_ai)
- If
_converted_model_bytesexists → load TensorRT from those bytes - If GPU available → try downloading pre-built TensorRT engine from loader
- If download fails → download ONNX model, start background thread for ONNX→TensorRT conversion
- If no GPU → load OnnxEngine from ONNX model bytes
Preprocessing
cv2.dnn.blobFromImage: scale 1/255, resize to model dims, BGR→RGB, no crop- Stack multiple frames via
np.vstackfor batched inference
Postprocessing
- Engine output format:
[batch][detection_index][x1, y1, x2, y2, confidence, class_id] - Coordinates normalized to 0..1 by dividing by model width/height
- Converted to center-format (cx, cy, w, h) Detection objects
- Filtered by
probability_threshold - Overlapping detections removed via
remove_overlapping_detections(greedy, keeps higher confidence)
Image Processing
- Small images (≤1.5× model size): processed as single frame
- Large images: split into tiles based on ground sampling distance. Tile size =
METERS_IN_TILE / GSDpixels. Tiles overlap by configurable percentage. - Tile deduplication: absolute-coordinate comparison across adjacent tiles using
Detection.__eq__ - Size filtering: detections whose physical size (meters) exceeds
AnnotationClass.max_object_size_metersare removed. Physical size computed from GSD × pixel dimensions.
Video Processing
- Frame sampling: every Nth frame (
frame_period_recognition) - Batch accumulation up to engine batch size
- Annotation validity: must differ from previous annotation by either:
- Time gap ≥
frame_recognition_seconds - More detections than previous
- Any detection moved beyond
tracking_distance_confidencethreshold - Any detection confidence increased beyond
tracking_probability_increase
- Time gap ≥
- Valid frames get JPEG-encoded image attached
Ground Sampling Distance (GSD)
GSD = sensor_width * altitude / (focal_length * image_width) — meters per pixel, used for physical size filtering of aerial detections.
Dependencies
- External:
cv2,numpy,pynvml,mimetypes,pathlib,threading - Internal:
constants_inf,ai_availability_status,annotation,ai_config,tensorrt_engine(conditional),onnx_engine(conditional),inference_engine(type)
Consumers
main— lazy-initializes Inference, callsrun_detect,detect_single_image, readsai_availability_status
Data Models
Uses Detection, Annotation (from annotation), AIRecognitionConfig (from ai_config), AIAvailabilityStatus (from ai_availability_status).
Configuration
All runtime config comes via AIRecognitionConfig dict. Engine selection is automatic based on GPU availability (checked at module-level via pynvml).
External Integrations
- Loader service (via loader_client): model download/upload
Security
None.
Tests
None found.