mirror of https://github.com/azaion/detections.git synced 2026-04-22 22:26:33 +00:00

Files

T

Oleksandr Bezdieniezhnykh 3165a88f0b Add detailed file index and enhance skill documentation for autopilot, decompose, deploy, plan, and research skills. Introduce tests-only mode in decompose skill, clarify required files for deploy and plan skills, and improve prerequisite checks across skills for better user guidance and workflow efficiency.

2026-03-22 16:15:49 +02:00

4.9 KiB

Raw Blame History

Module: inference

Purpose

Core inference orchestrator — manages the AI engine lifecycle, preprocesses media (images and video), runs batched inference, postprocesses detections, and applies validation filters (overlap removal, size filtering, tile deduplication, video tracking).

Public Interface

Class: Inference

Fields

Field	Type	Access	Description
`loader_client`	object	internal	LoaderHttpClient instance
`engine`	InferenceEngine	internal	Active engine (OnnxEngine or TensorRTEngine), None if unavailable
`ai_availability_status`	AIAvailabilityStatus	public	Current AI readiness status
`stop_signal`	bool	internal	Flag to abort video processing
`model_width`	int	internal	Model input width in pixels
`model_height`	int	internal	Model input height in pixels
`detection_counts`	dict[str, int]	internal	Per-media detection count
`is_building_engine`	bool	internal	True during async TensorRT conversion

Methods

Method	Signature	Access	Description
`__init__`	`(loader_client)`	public	Initializes state, calls `init_ai()`
`run_detect`	`(dict config_dict, annotation_callback, status_callback=None)`	cpdef	Main entry: parses config, separates images/videos, processes each
`detect_single_image`	`(bytes image_bytes, dict config_dict) -> list`	cpdef	Single-image detection from raw bytes, returns list[Detection]
`stop`	`()`	cpdef	Sets stop_signal to True
`init_ai`	`()`	cdef	Engine initialization: tries TensorRT engine file → falls back to ONNX → background TensorRT conversion
`preprocess`	`(frames) -> ndarray`	cdef	OpenCV blobFromImage: resize, normalize to 0..1, swap RGB, stack batch
`postprocess`	`(output, ai_config) -> list[list[Detection]]`	cdef	Parses engine output to Detection objects, applies confidence threshold and overlap removal

Internal Logic

Engine Initialization (`init_ai`)

If _converted_model_bytes exists → load TensorRT from those bytes
If GPU available → try downloading pre-built TensorRT engine from loader
If download fails → download ONNX model, start background thread for ONNX→TensorRT conversion
If no GPU → load OnnxEngine from ONNX model bytes

Preprocessing

cv2.dnn.blobFromImage: scale 1/255, resize to model dims, BGR→RGB, no crop
Stack multiple frames via np.vstack for batched inference

Postprocessing

Engine output format: [batch][detection_index][x1, y1, x2, y2, confidence, class_id]
Coordinates normalized to 0..1 by dividing by model width/height
Converted to center-format (cx, cy, w, h) Detection objects
Filtered by probability_threshold
Overlapping detections removed via remove_overlapping_detections (greedy, keeps higher confidence)

Image Processing

Small images (≤1.5× model size): processed as single frame
Large images: split into tiles based on ground sampling distance. Tile size = METERS_IN_TILE / GSD pixels. Tiles overlap by configurable percentage.
Tile deduplication: absolute-coordinate comparison across adjacent tiles using Detection.__eq__
Size filtering: detections whose physical size (meters) exceeds AnnotationClass.max_object_size_meters are removed. Physical size computed from GSD × pixel dimensions.

Video Processing

Frame sampling: every Nth frame (frame_period_recognition)
Batch accumulation up to engine batch size
Annotation validity: must differ from previous annotation by either:
- Time gap ≥ frame_recognition_seconds
- More detections than previous
- Any detection moved beyond tracking_distance_confidence threshold
- Any detection confidence increased beyond tracking_probability_increase
Valid frames get JPEG-encoded image attached

Ground Sampling Distance (GSD)

GSD = sensor_width * altitude / (focal_length * image_width) — meters per pixel, used for physical size filtering of aerial detections.

Dependencies

External: cv2, numpy, pynvml, mimetypes, pathlib, threading
Internal: constants_inf, ai_availability_status, annotation, ai_config, tensorrt_engine (conditional), onnx_engine (conditional), inference_engine (type)

Consumers

main — lazy-initializes Inference, calls run_detect, detect_single_image, reads ai_availability_status

Data Models

Uses Detection, Annotation (from annotation), AIRecognitionConfig (from ai_config), AIAvailabilityStatus (from ai_availability_status).

Configuration

All runtime config comes via AIRecognitionConfig dict. Engine selection is automatic based on GPU availability (checked at module-level via pynvml).

External Integrations

Loader service (via loader_client): model download/upload

Security

None.

Tests

None found.

4.9 KiB Raw Blame History Unescape Escape