Files
detections/_docs/02_document/modules/inference.md
T

4.9 KiB
Raw Blame History

Module: inference

Purpose

Core inference orchestrator — manages the AI engine lifecycle, preprocesses media (images and video), runs batched inference, postprocesses detections, and applies validation filters (overlap removal, size filtering, tile deduplication, video tracking).

Public Interface

Class: Inference

Fields

Field Type Access Description
loader_client object internal LoaderHttpClient instance
engine InferenceEngine internal Active engine (OnnxEngine or TensorRTEngine), None if unavailable
ai_availability_status AIAvailabilityStatus public Current AI readiness status
stop_signal bool internal Flag to abort video processing
model_width int internal Model input width in pixels
model_height int internal Model input height in pixels
detection_counts dict[str, int] internal Per-media detection count
is_building_engine bool internal True during async TensorRT conversion

Methods

Method Signature Access Description
__init__ (loader_client) public Initializes state, calls init_ai()
run_detect (dict config_dict, annotation_callback, status_callback=None) cpdef Main entry: parses config, separates images/videos, processes each
detect_single_image (bytes image_bytes, dict config_dict) -> list cpdef Single-image detection from raw bytes, returns list[Detection]
stop () cpdef Sets stop_signal to True
init_ai () cdef Engine initialization: tries TensorRT engine file → falls back to ONNX → background TensorRT conversion
preprocess (frames) -> ndarray cdef OpenCV blobFromImage: resize, normalize to 0..1, swap RGB, stack batch
postprocess (output, ai_config) -> list[list[Detection]] cdef Parses engine output to Detection objects, applies confidence threshold and overlap removal

Internal Logic

Engine Initialization (init_ai)

  1. If _converted_model_bytes exists → load TensorRT from those bytes
  2. If GPU available → try downloading pre-built TensorRT engine from loader
  3. If download fails → download ONNX model, start background thread for ONNX→TensorRT conversion
  4. If no GPU → load OnnxEngine from ONNX model bytes

Preprocessing

  • cv2.dnn.blobFromImage: scale 1/255, resize to model dims, BGR→RGB, no crop
  • Stack multiple frames via np.vstack for batched inference

Postprocessing

  • Engine output format: [batch][detection_index][x1, y1, x2, y2, confidence, class_id]
  • Coordinates normalized to 0..1 by dividing by model width/height
  • Converted to center-format (cx, cy, w, h) Detection objects
  • Filtered by probability_threshold
  • Overlapping detections removed via remove_overlapping_detections (greedy, keeps higher confidence)

Image Processing

  • Small images (≤1.5× model size): processed as single frame
  • Large images: split into tiles based on ground sampling distance. Tile size = METERS_IN_TILE / GSD pixels. Tiles overlap by configurable percentage.
  • Tile deduplication: absolute-coordinate comparison across adjacent tiles using Detection.__eq__
  • Size filtering: detections whose physical size (meters) exceeds AnnotationClass.max_object_size_meters are removed. Physical size computed from GSD × pixel dimensions.

Video Processing

  • Frame sampling: every Nth frame (frame_period_recognition)
  • Batch accumulation up to engine batch size
  • Annotation validity: must differ from previous annotation by either:
    • Time gap ≥ frame_recognition_seconds
    • More detections than previous
    • Any detection moved beyond tracking_distance_confidence threshold
    • Any detection confidence increased beyond tracking_probability_increase
  • Valid frames get JPEG-encoded image attached

Ground Sampling Distance (GSD)

GSD = sensor_width * altitude / (focal_length * image_width) — meters per pixel, used for physical size filtering of aerial detections.

Dependencies

  • External: cv2, numpy, pynvml, mimetypes, pathlib, threading
  • Internal: constants_inf, ai_availability_status, annotation, ai_config, tensorrt_engine (conditional), onnx_engine (conditional), inference_engine (type)

Consumers

  • main — lazy-initializes Inference, calls run_detect, detect_single_image, reads ai_availability_status

Data Models

Uses Detection, Annotation (from annotation), AIRecognitionConfig (from ai_config), AIAvailabilityStatus (from ai_availability_status).

Configuration

All runtime config comes via AIRecognitionConfig dict. Engine selection is automatic based on GPU availability (checked at module-level via pynvml).

External Integrations

  • Loader service (via loader_client): model download/upload

Security

None.

Tests

None found.