Files
detections/_docs/02_document/modules/inference.md
T
Oleksandr Bezdieniezhnykh 7a7f2a4cdd [AZ-180] Update module and component docs for Jetson/INT8 changes
Made-with: Cursor
2026-04-02 07:25:22 +03:00

126 lines
6.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Module: inference
## Purpose
Core inference orchestrator — manages the AI engine lifecycle, preprocesses media (images and video), runs batched inference, postprocesses detections, and applies validation filters (overlap removal, size filtering, tile deduplication, video tracking).
## Public Interface
### Free Functions
| Function | Signature | Description |
|----------|-----------|-------------|
| `ai_config_from_dict` | `(dict data) -> AIRecognitionConfig` | Python-callable wrapper around `AIRecognitionConfig.from_dict` |
### Class: Inference
#### Fields
| Field | Type | Access | Description |
|-------|------|--------|-------------|
| `loader_client` | LoaderHttpClient | internal | HTTP client for model download/upload |
| `engine` | InferenceEngine | internal | Active engine (OnnxEngine or TensorRTEngine), None if unavailable |
| `ai_availability_status` | AIAvailabilityStatus | public | Current AI readiness status |
| `stop_signal` | bool | internal | Flag to abort video processing |
| `detection_counts` | dict[str, int] | internal | Per-media detection count |
| `is_building_engine` | bool | internal | True during async TensorRT conversion |
#### Properties
| Property | Return Type | Description |
|----------|-------------|-------------|
| `is_engine_ready` | bool | True if engine is not None |
| `engine_name` | str or None | Engine type name from the active engine |
#### Methods
| Method | Signature | Access | Description |
|--------|-----------|--------|-------------|
| `__init__` | `(loader_client)` | public | Initializes state, calls `init_ai()` |
| `run_detect_image` | `(bytes image_bytes, AIRecognitionConfig ai_config, str media_name, annotation_callback, status_callback=None)` | cpdef | Decodes image from bytes, runs tiling + inference + postprocessing |
| `run_detect_video` | `(bytes video_bytes, AIRecognitionConfig ai_config, str media_name, str save_path, annotation_callback, status_callback=None)` | cpdef | Processes video from in-memory bytes via PyAV, concurrently writes to save_path |
| `run_detect_video_stream` | `(object readable, AIRecognitionConfig ai_config, str media_name, annotation_callback, status_callback=None)` | cpdef | Processes video from a file-like readable (e.g. StreamingBuffer) via PyAV — true streaming, no bytes in RAM (AZ-178) |
| `stop` | `()` | cpdef | Sets stop_signal to True |
| `init_ai` | `()` | cdef | Engine initialization: tries INT8 engine → FP16 engine → background TensorRT conversion (with optional INT8 calibration cache) |
| `_try_download_calib_cache` | `(str models_dir) -> str or None` | cdef | Downloads `azaion.int8_calib.cache` from Loader; writes to a temp file; returns path or None if unavailable |
| `preprocess` | `(frames) -> ndarray` | via engine | OpenCV blobFromImage: resize, normalize to 0..1, swap RGB, stack batch |
| `postprocess` | `(output, ai_config) -> list[list[Detection]]` | via engine | Parses engine output to Detection objects, applies confidence threshold and overlap removal |
## Internal Logic
### Engine Initialization (`init_ai`)
1. If `_converted_model_bytes` exists → load TensorRT from those bytes
2. If GPU available → try downloading pre-built INT8 engine first (`*.int8.engine`), then FP16 engine (`*.engine`) from loader
3. If no cached engine found → download ONNX source, attempt to download INT8 calibration cache (`azaion.int8_calib.cache`) from loader, spawn background thread for ONNX→TensorRT conversion (INT8 if cache downloaded, FP16 fallback)
4. Calibration cache download failure is non-fatal — log warning and proceed with FP16
5. Temporary calibration cache file is deleted after conversion completes
6. If no GPU → load OnnxEngine from ONNX model bytes
### Stream-Based Media Processing (AZ-173)
Both `run_detect_image` and `run_detect_video` accept raw bytes instead of file paths. This supports the distributed architecture where media arrives as HTTP uploads or is read from storage by the API layer.
### Image Processing (`run_detect_image`)
1. Decodes image bytes via `cv2.imdecode`
2. Small images (≤1.5× model size): processed as single frame
3. Large images: split into tiles based on GSD. Tile size = `METERS_IN_TILE / GSD` pixels. Tiles overlap by configurable percentage.
4. Tile deduplication: absolute-coordinate comparison across adjacent tiles
5. Size filtering: detections exceeding `AnnotationClass.max_object_size_meters` are removed
### Video Processing (`run_detect_video`)
1. Concurrently writes raw bytes to `save_path` in a background thread (for persistent storage)
2. Opens video from in-memory `BytesIO` via PyAV (`av.open`)
3. Decodes frames via `container.decode(vstream)` — no temporary file needed for reading
4. Frame sampling: every Nth frame (`frame_period_recognition`)
5. Batch accumulation up to engine batch size
6. Annotation validity heuristics (time gap, detection count increase, spatial movement, confidence improvement)
7. Valid frames get JPEG-encoded image attached
### Streaming Video Processing (`run_detect_video_stream` — AZ-178)
1. Accepts a file-like `readable` object (e.g. `StreamingBuffer`) instead of `bytes`
2. Opens directly via `av.open(readable)` — PyAV calls `read()`/`seek()` on the object
3. No writer thread needed — the caller (API layer) manages disk persistence via the same buffer
4. Reuses `_process_video_pyav` for frame decoding, batch inference, and annotation delivery
5. For faststart MP4/MKV/WebM: frames are decoded as bytes stream in (~500ms latency)
6. For standard MP4 (moov at end): PyAV's `seek(0, 2)` blocks until the buffer signals EOF, then decoding starts
### Ground Sampling Distance (GSD)
`GSD = sensor_width * altitude / (focal_length * image_width)` — meters per pixel, used for physical size filtering of aerial detections.
## Dependencies
- **External**: `cv2`, `numpy`, `av` (PyAV), `io`, `threading`
- **Internal**: `constants_inf`, `ai_availability_status`, `annotation`, `ai_config`, `tensorrt_engine` (conditional), `onnx_engine` (conditional), `inference_engine` (type)
## Consumers
- `main` — lazy-initializes Inference, calls `run_detect_image`/`run_detect_video`/`run_detect_video_stream`, reads `ai_availability_status` and `is_engine_ready`
## Data Models
Uses `Detection`, `Annotation` (from annotation), `AIRecognitionConfig` (from ai_config), `AIAvailabilityStatus` (from ai_availability_status).
## Configuration
All runtime config comes via `AIRecognitionConfig` dict. Engine selection is automatic based on GPU availability (checked at module-level via pynvml).
## External Integrations
- **Loader service** (via loader_client): model download/upload
## Security
None.
## Tests
- `tests/test_ai_config_from_dict.py` — tests `ai_config_from_dict` helper
- `tests/test_az178_streaming_video.py` — tests `run_detect_video_stream` via the `/detect/video` endpoint and `StreamingBuffer`
- `e2e/tests/test_video.py` — exercises `run_detect_video` via the full API
- `e2e/tests/test_single_image.py` — exercises `run_detect_image` via the full API