- Update module docs: main, inference, ai_config, loader_http_client - Add new module doc: media_hash - Update component docs: inference_pipeline, api - Update system-flows (F2, F3) and data_parameters - Add Task Mode to document skill for incremental doc updates - Insert Step 11 (Update Docs) in existing-code flow, renumber 11-13 to 12-14 Made-with: Cursor
5.3 KiB
Module: inference
Purpose
Core inference orchestrator — manages the AI engine lifecycle, preprocesses media (images and video), runs batched inference, postprocesses detections, and applies validation filters (overlap removal, size filtering, tile deduplication, video tracking).
Public Interface
Free Functions
| Function | Signature | Description |
|---|---|---|
ai_config_from_dict |
(dict data) -> AIRecognitionConfig |
Python-callable wrapper around AIRecognitionConfig.from_dict |
Class: Inference
Fields
| Field | Type | Access | Description |
|---|---|---|---|
loader_client |
LoaderHttpClient | internal | HTTP client for model download/upload |
engine |
InferenceEngine | internal | Active engine (OnnxEngine or TensorRTEngine), None if unavailable |
ai_availability_status |
AIAvailabilityStatus | public | Current AI readiness status |
stop_signal |
bool | internal | Flag to abort video processing |
detection_counts |
dict[str, int] | internal | Per-media detection count |
is_building_engine |
bool | internal | True during async TensorRT conversion |
Properties
| Property | Return Type | Description |
|---|---|---|
is_engine_ready |
bool | True if engine is not None |
engine_name |
str or None | Engine type name from the active engine |
Methods
| Method | Signature | Access | Description |
|---|---|---|---|
__init__ |
(loader_client) |
public | Initializes state, calls init_ai() |
run_detect_image |
(bytes image_bytes, AIRecognitionConfig ai_config, str media_name, annotation_callback, status_callback=None) |
cpdef | Decodes image from bytes, runs tiling + inference + postprocessing |
run_detect_video |
(bytes video_bytes, AIRecognitionConfig ai_config, str media_name, str save_path, annotation_callback, status_callback=None) |
cpdef | Processes video from in-memory bytes via PyAV, concurrently writes to save_path |
stop |
() |
cpdef | Sets stop_signal to True |
init_ai |
() |
cdef | Engine initialization: tries TensorRT → falls back to ONNX → background TensorRT conversion |
preprocess |
(frames) -> ndarray |
via engine | OpenCV blobFromImage: resize, normalize to 0..1, swap RGB, stack batch |
postprocess |
(output, ai_config) -> list[list[Detection]] |
via engine | Parses engine output to Detection objects, applies confidence threshold and overlap removal |
Internal Logic
Engine Initialization (init_ai)
- If
_converted_model_bytesexists → load TensorRT from those bytes - If GPU available → try downloading pre-built TensorRT engine from loader
- If download fails → download ONNX model, start background thread for ONNX→TensorRT conversion
- If no GPU → load OnnxEngine from ONNX model bytes
Stream-Based Media Processing (AZ-173)
Both run_detect_image and run_detect_video accept raw bytes instead of file paths. This supports the distributed architecture where media arrives as HTTP uploads or is read from storage by the API layer.
Image Processing (run_detect_image)
- Decodes image bytes via
cv2.imdecode - Small images (≤1.5× model size): processed as single frame
- Large images: split into tiles based on GSD. Tile size =
METERS_IN_TILE / GSDpixels. Tiles overlap by configurable percentage. - Tile deduplication: absolute-coordinate comparison across adjacent tiles
- Size filtering: detections exceeding
AnnotationClass.max_object_size_metersare removed
Video Processing (run_detect_video)
- Concurrently writes raw bytes to
save_pathin a background thread (for persistent storage) - Opens video from in-memory
BytesIOvia PyAV (av.open) - Decodes frames via
container.decode(vstream)— no temporary file needed for reading - Frame sampling: every Nth frame (
frame_period_recognition) - Batch accumulation up to engine batch size
- Annotation validity heuristics (time gap, detection count increase, spatial movement, confidence improvement)
- Valid frames get JPEG-encoded image attached
Ground Sampling Distance (GSD)
GSD = sensor_width * altitude / (focal_length * image_width) — meters per pixel, used for physical size filtering of aerial detections.
Dependencies
- External:
cv2,numpy,av(PyAV),io,threading - Internal:
constants_inf,ai_availability_status,annotation,ai_config,tensorrt_engine(conditional),onnx_engine(conditional),inference_engine(type)
Consumers
main— lazy-initializes Inference, callsrun_detect_image/run_detect_video, readsai_availability_statusandis_engine_ready
Data Models
Uses Detection, Annotation (from annotation), AIRecognitionConfig (from ai_config), AIAvailabilityStatus (from ai_availability_status).
Configuration
All runtime config comes via AIRecognitionConfig dict. Engine selection is automatic based on GPU availability (checked at module-level via pynvml).
External Integrations
- Loader service (via loader_client): model download/upload
Security
None.
Tests
tests/test_ai_config_from_dict.py— testsai_config_from_dicthelpere2e/tests/test_video.py— exercisesrun_detect_videovia the full APIe2e/tests/test_single_image.py— exercisesrun_detect_imagevia the full API