# Module: inference ## Purpose Core inference orchestrator — manages the AI engine lifecycle, preprocesses media (images and video), runs batched inference, postprocesses detections, and applies validation filters (overlap removal, size filtering, tile deduplication, video tracking). ## Public Interface ### Free Functions | Function | Signature | Description | |----------|-----------|-------------| | `ai_config_from_dict` | `(dict data) -> AIRecognitionConfig` | Python-callable wrapper around `AIRecognitionConfig.from_dict` | ### Class: Inference #### Fields | Field | Type | Access | Description | |-------|------|--------|-------------| | `loader_client` | LoaderHttpClient | internal | HTTP client for model download/upload | | `engine` | InferenceEngine | internal | Active engine (OnnxEngine or TensorRTEngine), None if unavailable | | `ai_availability_status` | AIAvailabilityStatus | public | Current AI readiness status | | `stop_signal` | bool | internal | Flag to abort video processing | | `detection_counts` | dict[str, int] | internal | Per-media detection count | | `is_building_engine` | bool | internal | True during async TensorRT conversion | #### Properties | Property | Return Type | Description | |----------|-------------|-------------| | `is_engine_ready` | bool | True if engine is not None | | `engine_name` | str or None | Engine type name from the active engine | #### Methods | Method | Signature | Access | Description | |--------|-----------|--------|-------------| | `__init__` | `(loader_client)` | public | Initializes state, calls `init_ai()` | | `run_detect_image` | `(bytes image_bytes, AIRecognitionConfig ai_config, str media_name, annotation_callback, status_callback=None)` | cpdef | Decodes image from bytes, runs tiling + inference + postprocessing | | `run_detect_video` | `(bytes video_bytes, AIRecognitionConfig ai_config, str media_name, str save_path, annotation_callback, status_callback=None)` | cpdef | Processes video from in-memory bytes via PyAV, concurrently writes to save_path | | `stop` | `()` | cpdef | Sets stop_signal to True | | `init_ai` | `()` | cdef | Engine initialization: tries TensorRT → falls back to ONNX → background TensorRT conversion | | `preprocess` | `(frames) -> ndarray` | via engine | OpenCV blobFromImage: resize, normalize to 0..1, swap RGB, stack batch | | `postprocess` | `(output, ai_config) -> list[list[Detection]]` | via engine | Parses engine output to Detection objects, applies confidence threshold and overlap removal | ## Internal Logic ### Engine Initialization (`init_ai`) 1. If `_converted_model_bytes` exists → load TensorRT from those bytes 2. If GPU available → try downloading pre-built TensorRT engine from loader 3. If download fails → download ONNX model, start background thread for ONNX→TensorRT conversion 4. If no GPU → load OnnxEngine from ONNX model bytes ### Stream-Based Media Processing (AZ-173) Both `run_detect_image` and `run_detect_video` accept raw bytes instead of file paths. This supports the distributed architecture where media arrives as HTTP uploads or is read from storage by the API layer. ### Image Processing (`run_detect_image`) 1. Decodes image bytes via `cv2.imdecode` 2. Small images (≤1.5× model size): processed as single frame 3. Large images: split into tiles based on GSD. Tile size = `METERS_IN_TILE / GSD` pixels. Tiles overlap by configurable percentage. 4. Tile deduplication: absolute-coordinate comparison across adjacent tiles 5. Size filtering: detections exceeding `AnnotationClass.max_object_size_meters` are removed ### Video Processing (`run_detect_video`) 1. Concurrently writes raw bytes to `save_path` in a background thread (for persistent storage) 2. Opens video from in-memory `BytesIO` via PyAV (`av.open`) 3. Decodes frames via `container.decode(vstream)` — no temporary file needed for reading 4. Frame sampling: every Nth frame (`frame_period_recognition`) 5. Batch accumulation up to engine batch size 6. Annotation validity heuristics (time gap, detection count increase, spatial movement, confidence improvement) 7. Valid frames get JPEG-encoded image attached ### Ground Sampling Distance (GSD) `GSD = sensor_width * altitude / (focal_length * image_width)` — meters per pixel, used for physical size filtering of aerial detections. ## Dependencies - **External**: `cv2`, `numpy`, `av` (PyAV), `io`, `threading` - **Internal**: `constants_inf`, `ai_availability_status`, `annotation`, `ai_config`, `tensorrt_engine` (conditional), `onnx_engine` (conditional), `inference_engine` (type) ## Consumers - `main` — lazy-initializes Inference, calls `run_detect_image`/`run_detect_video`, reads `ai_availability_status` and `is_engine_ready` ## Data Models Uses `Detection`, `Annotation` (from annotation), `AIRecognitionConfig` (from ai_config), `AIAvailabilityStatus` (from ai_availability_status). ## Configuration All runtime config comes via `AIRecognitionConfig` dict. Engine selection is automatic based on GPU availability (checked at module-level via pynvml). ## External Integrations - **Loader service** (via loader_client): model download/upload ## Security None. ## Tests - `tests/test_ai_config_from_dict.py` — tests `ai_config_from_dict` helper - `e2e/tests/test_video.py` — exercises `run_detect_video` via the full API - `e2e/tests/test_single_image.py` — exercises `run_detect_image` via the full API