mirror of https://github.com/azaion/detections.git synced 2026-04-22 22:26:33 +00:00

Files

T

Oleksandr Bezdieniezhnykh 7a7f2a4cdd [AZ-180] Update module and component docs for Jetson/INT8 changes

Made-with: Cursor

2026-04-02 07:25:22 +03:00

6.9 KiB

Raw Blame History

Module: inference

Purpose

Core inference orchestrator — manages the AI engine lifecycle, preprocesses media (images and video), runs batched inference, postprocesses detections, and applies validation filters (overlap removal, size filtering, tile deduplication, video tracking).

Public Interface

Free Functions

Function	Signature	Description
`ai_config_from_dict`	`(dict data) -> AIRecognitionConfig`	Python-callable wrapper around `AIRecognitionConfig.from_dict`

Class: Inference

Fields

Field	Type	Access	Description
`loader_client`	LoaderHttpClient	internal	HTTP client for model download/upload
`engine`	InferenceEngine	internal	Active engine (OnnxEngine or TensorRTEngine), None if unavailable
`ai_availability_status`	AIAvailabilityStatus	public	Current AI readiness status
`stop_signal`	bool	internal	Flag to abort video processing
`detection_counts`	dict[str, int]	internal	Per-media detection count
`is_building_engine`	bool	internal	True during async TensorRT conversion

Properties

Property	Return Type	Description
`is_engine_ready`	bool	True if engine is not None
`engine_name`	str or None	Engine type name from the active engine

Methods

Method	Signature	Access	Description
`__init__`	`(loader_client)`	public	Initializes state, calls `init_ai()`
`run_detect_image`	`(bytes image_bytes, AIRecognitionConfig ai_config, str media_name, annotation_callback, status_callback=None)`	cpdef	Decodes image from bytes, runs tiling + inference + postprocessing
`run_detect_video`	`(bytes video_bytes, AIRecognitionConfig ai_config, str media_name, str save_path, annotation_callback, status_callback=None)`	cpdef	Processes video from in-memory bytes via PyAV, concurrently writes to save_path
`run_detect_video_stream`	`(object readable, AIRecognitionConfig ai_config, str media_name, annotation_callback, status_callback=None)`	cpdef	Processes video from a file-like readable (e.g. StreamingBuffer) via PyAV — true streaming, no bytes in RAM (AZ-178)
`stop`	`()`	cpdef	Sets stop_signal to True
`init_ai`	`()`	cdef	Engine initialization: tries INT8 engine → FP16 engine → background TensorRT conversion (with optional INT8 calibration cache)
`_try_download_calib_cache`	`(str models_dir) -> str or None`	cdef	Downloads `azaion.int8_calib.cache` from Loader; writes to a temp file; returns path or None if unavailable
`preprocess`	`(frames) -> ndarray`	via engine	OpenCV blobFromImage: resize, normalize to 0..1, swap RGB, stack batch
`postprocess`	`(output, ai_config) -> list[list[Detection]]`	via engine	Parses engine output to Detection objects, applies confidence threshold and overlap removal

Internal Logic

Engine Initialization (`init_ai`)

If _converted_model_bytes exists → load TensorRT from those bytes
If GPU available → try downloading pre-built INT8 engine first (*.int8.engine), then FP16 engine (*.engine) from loader
If no cached engine found → download ONNX source, attempt to download INT8 calibration cache (azaion.int8_calib.cache) from loader, spawn background thread for ONNX→TensorRT conversion (INT8 if cache downloaded, FP16 fallback)
Calibration cache download failure is non-fatal — log warning and proceed with FP16
Temporary calibration cache file is deleted after conversion completes
If no GPU → load OnnxEngine from ONNX model bytes

Stream-Based Media Processing (AZ-173)

Both run_detect_image and run_detect_video accept raw bytes instead of file paths. This supports the distributed architecture where media arrives as HTTP uploads or is read from storage by the API layer.

Image Processing (`run_detect_image`)

Decodes image bytes via cv2.imdecode
Small images (≤1.5× model size): processed as single frame
Large images: split into tiles based on GSD. Tile size = METERS_IN_TILE / GSD pixels. Tiles overlap by configurable percentage.
Tile deduplication: absolute-coordinate comparison across adjacent tiles
Size filtering: detections exceeding AnnotationClass.max_object_size_meters are removed

Video Processing (`run_detect_video`)

Concurrently writes raw bytes to save_path in a background thread (for persistent storage)
Opens video from in-memory BytesIO via PyAV (av.open)
Decodes frames via container.decode(vstream) — no temporary file needed for reading
Frame sampling: every Nth frame (frame_period_recognition)
Batch accumulation up to engine batch size
Annotation validity heuristics (time gap, detection count increase, spatial movement, confidence improvement)
Valid frames get JPEG-encoded image attached

Streaming Video Processing (`run_detect_video_stream` — AZ-178)

Accepts a file-like readable object (e.g. StreamingBuffer) instead of bytes
Opens directly via av.open(readable) — PyAV calls read()/seek() on the object
No writer thread needed — the caller (API layer) manages disk persistence via the same buffer
Reuses _process_video_pyav for frame decoding, batch inference, and annotation delivery
For faststart MP4/MKV/WebM: frames are decoded as bytes stream in (~500ms latency)
For standard MP4 (moov at end): PyAV's seek(0, 2) blocks until the buffer signals EOF, then decoding starts

Ground Sampling Distance (GSD)

GSD = sensor_width * altitude / (focal_length * image_width) — meters per pixel, used for physical size filtering of aerial detections.

Dependencies

External: cv2, numpy, av (PyAV), io, threading
Internal: constants_inf, ai_availability_status, annotation, ai_config, tensorrt_engine (conditional), onnx_engine (conditional), inference_engine (type)

Consumers

main — lazy-initializes Inference, calls run_detect_image/run_detect_video/run_detect_video_stream, reads ai_availability_status and is_engine_ready

Data Models

Uses Detection, Annotation (from annotation), AIRecognitionConfig (from ai_config), AIAvailabilityStatus (from ai_availability_status).

Configuration

All runtime config comes via AIRecognitionConfig dict. Engine selection is automatic based on GPU availability (checked at module-level via pynvml).

External Integrations

Loader service (via loader_client): model download/upload

Security

None.

Tests

tests/test_ai_config_from_dict.py — tests ai_config_from_dict helper
tests/test_az178_streaming_video.py — tests run_detect_video_stream via the /detect/video endpoint and StreamingBuffer
e2e/tests/test_video.py — exercises run_detect_video via the full API
e2e/tests/test_single_image.py — exercises run_detect_image via the full API

6.9 KiB Raw Blame History Unescape Escape