Files
detections/_docs/02_document/components/03_inference_pipeline/description.md
T

4.6 KiB

Component: Inference Pipeline

Overview

Purpose: Orchestrates the full inference lifecycle — engine initialization with fallback strategy, media preprocessing (images + video), batched inference execution, postprocessing with detection filtering, and result delivery via callbacks.

Pattern: Façade + Pipeline — Inference class is the single entry point that coordinates engine selection, preprocessing, inference, and postprocessing stages.

Upstream: Domain (data models, config, status), Inference Engines (OnnxEngine/TensorRTEngine), External Client (LoaderHttpClient). Downstream: API (creates Inference, calls run_detect and detect_single_image).

Modules

Module Role
inference Core orchestrator: engine lifecycle, preprocessing, postprocessing, image/video processing
loader_http_client HTTP client for model download/upload from Loader service

Internal Interfaces

Inference

cdef class Inference:
    __init__(loader_client)
    cpdef run_detect(dict config_dict, annotation_callback, status_callback=None)
    cpdef list detect_single_image(bytes image_bytes, dict config_dict)
    cpdef stop()

    # Internal pipeline stages:
    cdef init_ai()
    cdef preprocess(frames) -> ndarray
    cdef postprocess(output, ai_config) -> list[list[Detection]]
    cdef remove_overlapping_detections(list[Detection], float threshold) -> list[Detection]
    cdef _process_images(AIRecognitionConfig, list[str] paths)
    cdef _process_video(AIRecognitionConfig, str video_name)

LoaderHttpClient

class LoaderHttpClient:
    load_big_small_resource(str filename, str directory) -> LoadResult
    upload_big_small_resource(bytes content, str filename, str directory) -> LoadResult

External API

None — internal component, consumed by API layer.

Data Access Patterns

  • Model bytes downloaded from Loader service (HTTP)
  • Converted TensorRT engines uploaded back to Loader for caching
  • Video frames read via OpenCV VideoCapture
  • Images read via OpenCV imread
  • All processing is in-memory

Implementation Details

Engine Initialization Strategy

1. Check GPU availability (pynvml, compute capability ≥ 6.1)
2. If GPU:
   a. Try loading pre-built TensorRT engine from Loader
   b. If fails → download ONNX model → start background conversion thread
   c. Background thread: convert ONNX→TensorRT → upload to Loader → set _converted_model_bytes
   d. Next init_ai() call: load from _converted_model_bytes
3. If no GPU:
   a. Download ONNX model from Loader → create OnnxEngine

Preprocessing

  • cv2.dnn.blobFromImage: normalize 0..1, resize to model input, BGR→RGB
  • Batch via np.vstack

Postprocessing

  • Parse [batch][det][x1,y1,x2,y2,conf,cls] output
  • Normalize coordinates to 0..1
  • Convert to center-format Detection objects
  • Filter by confidence threshold
  • Remove overlapping detections (greedy: keep higher confidence, tie-break by lower class_id)

Large Image Tiling

  • Ground Sampling Distance: sensor_width * altitude / (focal_length * image_width)
  • Tile size: METERS_IN_TILE / GSD pixels
  • Overlap: configurable percentage
  • Tile deduplication: absolute-coordinate Detection equality across adjacent tiles
  • Physical size filtering: remove detections exceeding class max_object_size_meters

Video Processing

  • Frame sampling: every Nth frame
  • Annotation validity heuristics: time gap, detection count increase, spatial movement, confidence improvement
  • JPEG encoding of valid frames for annotation images

Callbacks

  • annotation_callback(annotation, percent) — called per valid annotation
  • status_callback(media_name, count) — called when all detections for a media item are complete

Caveats

  • ThreadPoolExecutor with max_workers=2 limits concurrent inference (set in main.py)
  • Background TensorRT conversion runs in a daemon thread — may be interrupted on shutdown
  • init_ai() called on every run_detect — idempotent but checks engine state each time
  • Video processing is sequential per video (no parallel video processing)
  • _tile_detections dict is instance-level state that persists across image calls within a single run_detect invocation

Dependency Graph

graph TD
    inference --> constants_inf
    inference --> ai_availability_status
    inference --> annotation
    inference --> ai_config
    inference -.-> onnx_engine
    inference -.-> tensorrt_engine
    inference --> loader_http_client

Logging Strategy

Extensive logging via constants_inf.log: engine init status, media processing start, GSD calculation, tile splitting, detection results, size filtering decisions.