Files
detections/_docs/02_document/modules/inference.md
T
Oleksandr Bezdieniezhnykh 1fe9425aa8 [AZ-172] Update documentation for distributed architecture, add Update Docs step to workflow
- Update module docs: main, inference, ai_config, loader_http_client
- Add new module doc: media_hash
- Update component docs: inference_pipeline, api
- Update system-flows (F2, F3) and data_parameters
- Add Task Mode to document skill for incremental doc updates
- Insert Step 11 (Update Docs) in existing-code flow, renumber 11-13 to 12-14

Made-with: Cursor
2026-03-31 17:25:58 +03:00

5.3 KiB
Raw Blame History

Module: inference

Purpose

Core inference orchestrator — manages the AI engine lifecycle, preprocesses media (images and video), runs batched inference, postprocesses detections, and applies validation filters (overlap removal, size filtering, tile deduplication, video tracking).

Public Interface

Free Functions

Function Signature Description
ai_config_from_dict (dict data) -> AIRecognitionConfig Python-callable wrapper around AIRecognitionConfig.from_dict

Class: Inference

Fields

Field Type Access Description
loader_client LoaderHttpClient internal HTTP client for model download/upload
engine InferenceEngine internal Active engine (OnnxEngine or TensorRTEngine), None if unavailable
ai_availability_status AIAvailabilityStatus public Current AI readiness status
stop_signal bool internal Flag to abort video processing
detection_counts dict[str, int] internal Per-media detection count
is_building_engine bool internal True during async TensorRT conversion

Properties

Property Return Type Description
is_engine_ready bool True if engine is not None
engine_name str or None Engine type name from the active engine

Methods

Method Signature Access Description
__init__ (loader_client) public Initializes state, calls init_ai()
run_detect_image (bytes image_bytes, AIRecognitionConfig ai_config, str media_name, annotation_callback, status_callback=None) cpdef Decodes image from bytes, runs tiling + inference + postprocessing
run_detect_video (bytes video_bytes, AIRecognitionConfig ai_config, str media_name, str save_path, annotation_callback, status_callback=None) cpdef Processes video from in-memory bytes via PyAV, concurrently writes to save_path
stop () cpdef Sets stop_signal to True
init_ai () cdef Engine initialization: tries TensorRT → falls back to ONNX → background TensorRT conversion
preprocess (frames) -> ndarray via engine OpenCV blobFromImage: resize, normalize to 0..1, swap RGB, stack batch
postprocess (output, ai_config) -> list[list[Detection]] via engine Parses engine output to Detection objects, applies confidence threshold and overlap removal

Internal Logic

Engine Initialization (init_ai)

  1. If _converted_model_bytes exists → load TensorRT from those bytes
  2. If GPU available → try downloading pre-built TensorRT engine from loader
  3. If download fails → download ONNX model, start background thread for ONNX→TensorRT conversion
  4. If no GPU → load OnnxEngine from ONNX model bytes

Stream-Based Media Processing (AZ-173)

Both run_detect_image and run_detect_video accept raw bytes instead of file paths. This supports the distributed architecture where media arrives as HTTP uploads or is read from storage by the API layer.

Image Processing (run_detect_image)

  1. Decodes image bytes via cv2.imdecode
  2. Small images (≤1.5× model size): processed as single frame
  3. Large images: split into tiles based on GSD. Tile size = METERS_IN_TILE / GSD pixels. Tiles overlap by configurable percentage.
  4. Tile deduplication: absolute-coordinate comparison across adjacent tiles
  5. Size filtering: detections exceeding AnnotationClass.max_object_size_meters are removed

Video Processing (run_detect_video)

  1. Concurrently writes raw bytes to save_path in a background thread (for persistent storage)
  2. Opens video from in-memory BytesIO via PyAV (av.open)
  3. Decodes frames via container.decode(vstream) — no temporary file needed for reading
  4. Frame sampling: every Nth frame (frame_period_recognition)
  5. Batch accumulation up to engine batch size
  6. Annotation validity heuristics (time gap, detection count increase, spatial movement, confidence improvement)
  7. Valid frames get JPEG-encoded image attached

Ground Sampling Distance (GSD)

GSD = sensor_width * altitude / (focal_length * image_width) — meters per pixel, used for physical size filtering of aerial detections.

Dependencies

  • External: cv2, numpy, av (PyAV), io, threading
  • Internal: constants_inf, ai_availability_status, annotation, ai_config, tensorrt_engine (conditional), onnx_engine (conditional), inference_engine (type)

Consumers

  • main — lazy-initializes Inference, calls run_detect_image/run_detect_video, reads ai_availability_status and is_engine_ready

Data Models

Uses Detection, Annotation (from annotation), AIRecognitionConfig (from ai_config), AIAvailabilityStatus (from ai_availability_status).

Configuration

All runtime config comes via AIRecognitionConfig dict. Engine selection is automatic based on GPU availability (checked at module-level via pynvml).

External Integrations

  • Loader service (via loader_client): model download/upload

Security

None.

Tests

  • tests/test_ai_config_from_dict.py — tests ai_config_from_dict helper
  • e2e/tests/test_video.py — exercises run_detect_video via the full API
  • e2e/tests/test_single_image.py — exercises run_detect_image via the full API