# Component: Inference Pipeline ## Overview **Purpose**: Orchestrates the full inference lifecycle — engine initialization with fallback strategy, media preprocessing (images + video), batched inference execution, postprocessing with detection filtering, and result delivery via callbacks. **Pattern**: Façade + Pipeline — `Inference` class is the single entry point that coordinates engine selection, preprocessing, inference, and postprocessing stages. **Upstream**: Domain (data models, config, status), Inference Engines (OnnxEngine/TensorRTEngine), External Client (LoaderHttpClient). **Downstream**: API (creates Inference, calls `run_detect` and `detect_single_image`). ## Modules | Module | Role | |--------|------| | `inference` | Core orchestrator: engine lifecycle, preprocessing, postprocessing, image/video processing | | `loader_http_client` | HTTP client for model download/upload from Loader service | ## Internal Interfaces ### Inference ``` cdef class Inference: __init__(loader_client) cpdef run_detect(dict config_dict, annotation_callback, status_callback=None) cpdef list detect_single_image(bytes image_bytes, dict config_dict) cpdef stop() # Internal pipeline stages: cdef init_ai() cdef preprocess(frames) -> ndarray cdef postprocess(output, ai_config) -> list[list[Detection]] cdef remove_overlapping_detections(list[Detection], float threshold) -> list[Detection] cdef _process_images(AIRecognitionConfig, list[str] paths) cdef _process_video(AIRecognitionConfig, str video_name) ``` ### LoaderHttpClient ``` class LoaderHttpClient: load_big_small_resource(str filename, str directory) -> LoadResult upload_big_small_resource(bytes content, str filename, str directory) -> LoadResult ``` ## External API None — internal component, consumed by API layer. ## Data Access Patterns - Model bytes downloaded from Loader service (HTTP) - Converted TensorRT engines uploaded back to Loader for caching - Video frames read via OpenCV VideoCapture - Images read via OpenCV imread - All processing is in-memory ## Implementation Details ### Engine Initialization Strategy ``` 1. Check GPU availability (pynvml, compute capability ≥ 6.1) 2. If GPU: a. Try loading pre-built TensorRT engine from Loader b. If fails → download ONNX model → start background conversion thread c. Background thread: convert ONNX→TensorRT → upload to Loader → set _converted_model_bytes d. Next init_ai() call: load from _converted_model_bytes 3. If no GPU: a. Download ONNX model from Loader → create OnnxEngine ``` ### Preprocessing - `cv2.dnn.blobFromImage`: normalize 0..1, resize to model input, BGR→RGB - Batch via `np.vstack` ### Postprocessing - Parse `[batch][det][x1,y1,x2,y2,conf,cls]` output - Normalize coordinates to 0..1 - Convert to center-format Detection objects - Filter by confidence threshold - Remove overlapping detections (greedy: keep higher confidence, tie-break by lower class_id) ### Large Image Tiling - Ground Sampling Distance: `sensor_width * altitude / (focal_length * image_width)` - Tile size: `METERS_IN_TILE / GSD` pixels - Overlap: configurable percentage - Tile deduplication: absolute-coordinate Detection equality across adjacent tiles - Physical size filtering: remove detections exceeding class max_object_size_meters ### Video Processing - Frame sampling: every Nth frame - Annotation validity heuristics: time gap, detection count increase, spatial movement, confidence improvement - JPEG encoding of valid frames for annotation images ### Callbacks - `annotation_callback(annotation, percent)` — called per valid annotation - `status_callback(media_name, count)` — called when all detections for a media item are complete ## Caveats - `ThreadPoolExecutor` with max_workers=2 limits concurrent inference (set in main.py) - Background TensorRT conversion runs in a daemon thread — may be interrupted on shutdown - `init_ai()` called on every `run_detect` — idempotent but checks engine state each time - Video processing is sequential per video (no parallel video processing) - `_tile_detections` dict is instance-level state that persists across image calls within a single `run_detect` invocation ## Dependency Graph ```mermaid graph TD inference --> constants_inf inference --> ai_availability_status inference --> annotation inference --> ai_config inference -.-> onnx_engine inference -.-> tensorrt_engine inference --> loader_http_client ``` ## Logging Strategy Extensive logging via `constants_inf.log`: engine init status, media processing start, GSD calculation, tile splitting, detection results, size filtering decisions.