4.6 KiB
Component: Inference Pipeline
Overview
Purpose: Orchestrates the full inference lifecycle — engine initialization with fallback strategy, media preprocessing (images + video), batched inference execution, postprocessing with detection filtering, and result delivery via callbacks.
Pattern: Façade + Pipeline — Inference class is the single entry point that coordinates engine selection, preprocessing, inference, and postprocessing stages.
Upstream: Domain (data models, config, status), Inference Engines (OnnxEngine/TensorRTEngine), External Client (LoaderHttpClient).
Downstream: API (creates Inference, calls run_detect and detect_single_image).
Modules
| Module | Role |
|---|---|
inference |
Core orchestrator: engine lifecycle, preprocessing, postprocessing, image/video processing |
loader_http_client |
HTTP client for model download/upload from Loader service |
Internal Interfaces
Inference
cdef class Inference:
__init__(loader_client)
cpdef run_detect(dict config_dict, annotation_callback, status_callback=None)
cpdef list detect_single_image(bytes image_bytes, dict config_dict)
cpdef stop()
# Internal pipeline stages:
cdef init_ai()
cdef preprocess(frames) -> ndarray
cdef postprocess(output, ai_config) -> list[list[Detection]]
cdef remove_overlapping_detections(list[Detection], float threshold) -> list[Detection]
cdef _process_images(AIRecognitionConfig, list[str] paths)
cdef _process_video(AIRecognitionConfig, str video_name)
LoaderHttpClient
class LoaderHttpClient:
load_big_small_resource(str filename, str directory) -> LoadResult
upload_big_small_resource(bytes content, str filename, str directory) -> LoadResult
External API
None — internal component, consumed by API layer.
Data Access Patterns
- Model bytes downloaded from Loader service (HTTP)
- Converted TensorRT engines uploaded back to Loader for caching
- Video frames read via OpenCV VideoCapture
- Images read via OpenCV imread
- All processing is in-memory
Implementation Details
Engine Initialization Strategy
1. Check GPU availability (pynvml, compute capability ≥ 6.1)
2. If GPU:
a. Try loading pre-built TensorRT engine from Loader
b. If fails → download ONNX model → start background conversion thread
c. Background thread: convert ONNX→TensorRT → upload to Loader → set _converted_model_bytes
d. Next init_ai() call: load from _converted_model_bytes
3. If no GPU:
a. Download ONNX model from Loader → create OnnxEngine
Preprocessing
cv2.dnn.blobFromImage: normalize 0..1, resize to model input, BGR→RGB- Batch via
np.vstack
Postprocessing
- Parse
[batch][det][x1,y1,x2,y2,conf,cls]output - Normalize coordinates to 0..1
- Convert to center-format Detection objects
- Filter by confidence threshold
- Remove overlapping detections (greedy: keep higher confidence, tie-break by lower class_id)
Large Image Tiling
- Ground Sampling Distance:
sensor_width * altitude / (focal_length * image_width) - Tile size:
METERS_IN_TILE / GSDpixels - Overlap: configurable percentage
- Tile deduplication: absolute-coordinate Detection equality across adjacent tiles
- Physical size filtering: remove detections exceeding class max_object_size_meters
Video Processing
- Frame sampling: every Nth frame
- Annotation validity heuristics: time gap, detection count increase, spatial movement, confidence improvement
- JPEG encoding of valid frames for annotation images
Callbacks
annotation_callback(annotation, percent)— called per valid annotationstatus_callback(media_name, count)— called when all detections for a media item are complete
Caveats
ThreadPoolExecutorwith max_workers=2 limits concurrent inference (set in main.py)- Background TensorRT conversion runs in a daemon thread — may be interrupted on shutdown
init_ai()called on everyrun_detect— idempotent but checks engine state each time- Video processing is sequential per video (no parallel video processing)
_tile_detectionsdict is instance-level state that persists across image calls within a singlerun_detectinvocation
Dependency Graph
graph TD
inference --> constants_inf
inference --> ai_availability_status
inference --> annotation
inference --> ai_config
inference -.-> onnx_engine
inference -.-> tensorrt_engine
inference --> loader_http_client
Logging Strategy
Extensive logging via constants_inf.log: engine init status, media processing start, GSD calculation, tile splitting, detection results, size filtering decisions.