mirror of
https://github.com/azaion/detections.git
synced 2026-04-22 21:36:32 +00:00
7a7f2a4cdd
Made-with: Cursor
126 lines
6.9 KiB
Markdown
126 lines
6.9 KiB
Markdown
# Module: inference
|
||
|
||
## Purpose
|
||
|
||
Core inference orchestrator — manages the AI engine lifecycle, preprocesses media (images and video), runs batched inference, postprocesses detections, and applies validation filters (overlap removal, size filtering, tile deduplication, video tracking).
|
||
|
||
## Public Interface
|
||
|
||
### Free Functions
|
||
|
||
| Function | Signature | Description |
|
||
|----------|-----------|-------------|
|
||
| `ai_config_from_dict` | `(dict data) -> AIRecognitionConfig` | Python-callable wrapper around `AIRecognitionConfig.from_dict` |
|
||
|
||
### Class: Inference
|
||
|
||
#### Fields
|
||
|
||
| Field | Type | Access | Description |
|
||
|-------|------|--------|-------------|
|
||
| `loader_client` | LoaderHttpClient | internal | HTTP client for model download/upload |
|
||
| `engine` | InferenceEngine | internal | Active engine (OnnxEngine or TensorRTEngine), None if unavailable |
|
||
| `ai_availability_status` | AIAvailabilityStatus | public | Current AI readiness status |
|
||
| `stop_signal` | bool | internal | Flag to abort video processing |
|
||
| `detection_counts` | dict[str, int] | internal | Per-media detection count |
|
||
| `is_building_engine` | bool | internal | True during async TensorRT conversion |
|
||
|
||
#### Properties
|
||
|
||
| Property | Return Type | Description |
|
||
|----------|-------------|-------------|
|
||
| `is_engine_ready` | bool | True if engine is not None |
|
||
| `engine_name` | str or None | Engine type name from the active engine |
|
||
|
||
#### Methods
|
||
|
||
| Method | Signature | Access | Description |
|
||
|--------|-----------|--------|-------------|
|
||
| `__init__` | `(loader_client)` | public | Initializes state, calls `init_ai()` |
|
||
| `run_detect_image` | `(bytes image_bytes, AIRecognitionConfig ai_config, str media_name, annotation_callback, status_callback=None)` | cpdef | Decodes image from bytes, runs tiling + inference + postprocessing |
|
||
| `run_detect_video` | `(bytes video_bytes, AIRecognitionConfig ai_config, str media_name, str save_path, annotation_callback, status_callback=None)` | cpdef | Processes video from in-memory bytes via PyAV, concurrently writes to save_path |
|
||
| `run_detect_video_stream` | `(object readable, AIRecognitionConfig ai_config, str media_name, annotation_callback, status_callback=None)` | cpdef | Processes video from a file-like readable (e.g. StreamingBuffer) via PyAV — true streaming, no bytes in RAM (AZ-178) |
|
||
| `stop` | `()` | cpdef | Sets stop_signal to True |
|
||
| `init_ai` | `()` | cdef | Engine initialization: tries INT8 engine → FP16 engine → background TensorRT conversion (with optional INT8 calibration cache) |
|
||
| `_try_download_calib_cache` | `(str models_dir) -> str or None` | cdef | Downloads `azaion.int8_calib.cache` from Loader; writes to a temp file; returns path or None if unavailable |
|
||
| `preprocess` | `(frames) -> ndarray` | via engine | OpenCV blobFromImage: resize, normalize to 0..1, swap RGB, stack batch |
|
||
| `postprocess` | `(output, ai_config) -> list[list[Detection]]` | via engine | Parses engine output to Detection objects, applies confidence threshold and overlap removal |
|
||
|
||
## Internal Logic
|
||
|
||
### Engine Initialization (`init_ai`)
|
||
|
||
1. If `_converted_model_bytes` exists → load TensorRT from those bytes
|
||
2. If GPU available → try downloading pre-built INT8 engine first (`*.int8.engine`), then FP16 engine (`*.engine`) from loader
|
||
3. If no cached engine found → download ONNX source, attempt to download INT8 calibration cache (`azaion.int8_calib.cache`) from loader, spawn background thread for ONNX→TensorRT conversion (INT8 if cache downloaded, FP16 fallback)
|
||
4. Calibration cache download failure is non-fatal — log warning and proceed with FP16
|
||
5. Temporary calibration cache file is deleted after conversion completes
|
||
6. If no GPU → load OnnxEngine from ONNX model bytes
|
||
|
||
### Stream-Based Media Processing (AZ-173)
|
||
|
||
Both `run_detect_image` and `run_detect_video` accept raw bytes instead of file paths. This supports the distributed architecture where media arrives as HTTP uploads or is read from storage by the API layer.
|
||
|
||
### Image Processing (`run_detect_image`)
|
||
|
||
1. Decodes image bytes via `cv2.imdecode`
|
||
2. Small images (≤1.5× model size): processed as single frame
|
||
3. Large images: split into tiles based on GSD. Tile size = `METERS_IN_TILE / GSD` pixels. Tiles overlap by configurable percentage.
|
||
4. Tile deduplication: absolute-coordinate comparison across adjacent tiles
|
||
5. Size filtering: detections exceeding `AnnotationClass.max_object_size_meters` are removed
|
||
|
||
### Video Processing (`run_detect_video`)
|
||
|
||
1. Concurrently writes raw bytes to `save_path` in a background thread (for persistent storage)
|
||
2. Opens video from in-memory `BytesIO` via PyAV (`av.open`)
|
||
3. Decodes frames via `container.decode(vstream)` — no temporary file needed for reading
|
||
4. Frame sampling: every Nth frame (`frame_period_recognition`)
|
||
5. Batch accumulation up to engine batch size
|
||
6. Annotation validity heuristics (time gap, detection count increase, spatial movement, confidence improvement)
|
||
7. Valid frames get JPEG-encoded image attached
|
||
|
||
### Streaming Video Processing (`run_detect_video_stream` — AZ-178)
|
||
|
||
1. Accepts a file-like `readable` object (e.g. `StreamingBuffer`) instead of `bytes`
|
||
2. Opens directly via `av.open(readable)` — PyAV calls `read()`/`seek()` on the object
|
||
3. No writer thread needed — the caller (API layer) manages disk persistence via the same buffer
|
||
4. Reuses `_process_video_pyav` for frame decoding, batch inference, and annotation delivery
|
||
5. For faststart MP4/MKV/WebM: frames are decoded as bytes stream in (~500ms latency)
|
||
6. For standard MP4 (moov at end): PyAV's `seek(0, 2)` blocks until the buffer signals EOF, then decoding starts
|
||
|
||
### Ground Sampling Distance (GSD)
|
||
|
||
`GSD = sensor_width * altitude / (focal_length * image_width)` — meters per pixel, used for physical size filtering of aerial detections.
|
||
|
||
## Dependencies
|
||
|
||
- **External**: `cv2`, `numpy`, `av` (PyAV), `io`, `threading`
|
||
- **Internal**: `constants_inf`, `ai_availability_status`, `annotation`, `ai_config`, `tensorrt_engine` (conditional), `onnx_engine` (conditional), `inference_engine` (type)
|
||
|
||
## Consumers
|
||
|
||
- `main` — lazy-initializes Inference, calls `run_detect_image`/`run_detect_video`/`run_detect_video_stream`, reads `ai_availability_status` and `is_engine_ready`
|
||
|
||
## Data Models
|
||
|
||
Uses `Detection`, `Annotation` (from annotation), `AIRecognitionConfig` (from ai_config), `AIAvailabilityStatus` (from ai_availability_status).
|
||
|
||
## Configuration
|
||
|
||
All runtime config comes via `AIRecognitionConfig` dict. Engine selection is automatic based on GPU availability (checked at module-level via pynvml).
|
||
|
||
## External Integrations
|
||
|
||
- **Loader service** (via loader_client): model download/upload
|
||
|
||
## Security
|
||
|
||
None.
|
||
|
||
## Tests
|
||
|
||
- `tests/test_ai_config_from_dict.py` — tests `ai_config_from_dict` helper
|
||
- `tests/test_az178_streaming_video.py` — tests `run_detect_video_stream` via the `/detect/video` endpoint and `StreamingBuffer`
|
||
- `e2e/tests/test_video.py` — exercises `run_detect_video` via the full API
|
||
- `e2e/tests/test_single_image.py` — exercises `run_detect_image` via the full API
|