[AZ-172] Update documentation for distributed architecture, add Update Docs step to workflow

- Update module docs: main, inference, ai_config, loader_http_client - Add new module doc: media_hash - Update component docs: inference_pipeline, api - Update system-flows (F2, F3) and data_parameters - Add Task Mode to document skill for incremental doc updates - Insert Step 11 (Update Docs) in existing-code flow, renumber 11-13 to 12-14 Made-with: Cursor
2026-06-21 11:01:08 +00:00 · 2026-03-31 17:25:58 +03:00
parent e29606c313
commit 1fe9425aa8
12 changed files with 447 additions and 245 deletions
@@ -6,32 +6,43 @@ Core inference orchestrator — manages the AI engine lifecycle, preprocesses me

 ## Public Interface

+### Free Functions
+
+| Function | Signature | Description |
+|----------|-----------|-------------|
+| `ai_config_from_dict` | `(dict data) -> AIRecognitionConfig` | Python-callable wrapper around `AIRecognitionConfig.from_dict` |
+
 ### Class: Inference

 #### Fields

 | Field | Type | Access | Description |
 |-------|------|--------|-------------|
-| `loader_client` | object | internal | LoaderHttpClient instance |
+| `loader_client` | LoaderHttpClient | internal | HTTP client for model download/upload |
 | `engine` | InferenceEngine | internal | Active engine (OnnxEngine or TensorRTEngine), None if unavailable |
 | `ai_availability_status` | AIAvailabilityStatus | public | Current AI readiness status |
 | `stop_signal` | bool | internal | Flag to abort video processing |
-| `model_width` | int | internal | Model input width in pixels |
-| `model_height` | int | internal | Model input height in pixels |
 | `detection_counts` | dict[str, int] | internal | Per-media detection count |
 | `is_building_engine` | bool | internal | True during async TensorRT conversion |

+#### Properties
+
+| Property | Return Type | Description |
+|----------|-------------|-------------|
+| `is_engine_ready` | bool | True if engine is not None |
+| `engine_name` | str or None | Engine type name from the active engine |
+
 #### Methods

 | Method | Signature | Access | Description |
 |--------|-----------|--------|-------------|
 | `__init__` | `(loader_client)` | public | Initializes state, calls `init_ai()` |
-| `run_detect` | `(dict config_dict, annotation_callback, status_callback=None)` | cpdef | Main entry: parses config, separates images/videos, processes each |
-| `detect_single_image` | `(bytes image_bytes, dict config_dict) -> list` | cpdef | Single-image detection from raw bytes, returns list[Detection] |
+| `run_detect_image` | `(bytes image_bytes, AIRecognitionConfig ai_config, str media_name, annotation_callback, status_callback=None)` | cpdef | Decodes image from bytes, runs tiling + inference + postprocessing |
+| `run_detect_video` | `(bytes video_bytes, AIRecognitionConfig ai_config, str media_name, str save_path, annotation_callback, status_callback=None)` | cpdef | Processes video from in-memory bytes via PyAV, concurrently writes to save_path |
 | `stop` | `()` | cpdef | Sets stop_signal to True |
-| `init_ai` | `()` | cdef | Engine initialization: tries TensorRT engine file → falls back to ONNX → background TensorRT conversion |
-| `preprocess` | `(frames) -> ndarray` | cdef | OpenCV blobFromImage: resize, normalize to 0..1, swap RGB, stack batch |
-| `postprocess` | `(output, ai_config) -> list[list[Detection]]` | cdef | Parses engine output to Detection objects, applies confidence threshold and overlap removal |
+| `init_ai` | `()` | cdef | Engine initialization: tries TensorRT → falls back to ONNX → background TensorRT conversion |
+| `preprocess` | `(frames) -> ndarray` | via engine | OpenCV blobFromImage: resize, normalize to 0..1, swap RGB, stack batch |
+| `postprocess` | `(output, ai_config) -> list[list[Detection]]` | via engine | Parses engine output to Detection objects, applies confidence threshold and overlap removal |

 ## Internal Logic

@@ -42,36 +53,27 @@ Core inference orchestrator — manages the AI engine lifecycle, preprocesses me
 3. If download fails → download ONNX model, start background thread for ONNX→TensorRT conversion
 4. If no GPU → load OnnxEngine from ONNX model bytes

-### Preprocessing
+### Stream-Based Media Processing (AZ-173)

- `cv2.dnn.blobFromImage`: scale 1/255, resize to model dims, BGR→RGB, no crop
- Stack multiple frames via `np.vstack` for batched inference
+Both `run_detect_image` and `run_detect_video` accept raw bytes instead of file paths. This supports the distributed architecture where media arrives as HTTP uploads or is read from storage by the API layer.

-### Postprocessing
+### Image Processing (`run_detect_image`)

- Engine output format: `[batch][detection_index][x1, y1, x2, y2, confidence, class_id]`
- Coordinates normalized to 0..1 by dividing by model width/height
- Converted to center-format (cx, cy, w, h) Detection objects
- Filtered by `probability_threshold`
- Overlapping detections removed via `remove_overlapping_detections` (greedy, keeps higher confidence)
+1. Decodes image bytes via `cv2.imdecode`
+2. Small images (≤1.5× model size): processed as single frame
+3. Large images: split into tiles based on GSD. Tile size = `METERS_IN_TILE / GSD` pixels. Tiles overlap by configurable percentage.
+4. Tile deduplication: absolute-coordinate comparison across adjacent tiles
+5. Size filtering: detections exceeding `AnnotationClass.max_object_size_meters` are removed

-### Image Processing
+### Video Processing (`run_detect_video`)

- Small images (≤1.5× model size): processed as single frame
- Large images: split into tiles based on ground sampling distance. Tile size = `METERS_IN_TILE / GSD` pixels. Tiles overlap by configurable percentage.
- Tile deduplication: absolute-coordinate comparison across adjacent tiles using `Detection.__eq__`
- Size filtering: detections whose physical size (meters) exceeds `AnnotationClass.max_object_size_meters` are removed. Physical size computed from GSD × pixel dimensions.
-
-### Video Processing
-
- Frame sampling: every Nth frame (`frame_period_recognition`)
- Batch accumulation up to engine batch size
- Annotation validity: must differ from previous annotation by either:
-  - Time gap ≥ `frame_recognition_seconds`
-  - More detections than previous
-  - Any detection moved beyond `tracking_distance_confidence` threshold
-  - Any detection confidence increased beyond `tracking_probability_increase`
- Valid frames get JPEG-encoded image attached
+1. Concurrently writes raw bytes to `save_path` in a background thread (for persistent storage)
+2. Opens video from in-memory `BytesIO` via PyAV (`av.open`)
+3. Decodes frames via `container.decode(vstream)` — no temporary file needed for reading
+4. Frame sampling: every Nth frame (`frame_period_recognition`)
+5. Batch accumulation up to engine batch size
+6. Annotation validity heuristics (time gap, detection count increase, spatial movement, confidence improvement)
+7. Valid frames get JPEG-encoded image attached

 ### Ground Sampling Distance (GSD)

@@ -79,12 +81,12 @@ Core inference orchestrator — manages the AI engine lifecycle, preprocesses me

 ## Dependencies

- **External**: `cv2`, `numpy`, `pynvml`, `mimetypes`, `pathlib`, `threading`
+- **External**: `cv2`, `numpy`, `av` (PyAV), `io`, `threading`
 - **Internal**: `constants_inf`, `ai_availability_status`, `annotation`, `ai_config`, `tensorrt_engine` (conditional), `onnx_engine` (conditional), `inference_engine` (type)

 ## Consumers

- `main` — lazy-initializes Inference, calls `run_detect`, `detect_single_image`, reads `ai_availability_status`
+- `main` — lazy-initializes Inference, calls `run_detect_image`/`run_detect_video`, reads `ai_availability_status` and `is_engine_ready`

 ## Data Models

@@ -104,4 +106,6 @@ None.

 ## Tests

-None found.
+- `tests/test_ai_config_from_dict.py` — tests `ai_config_from_dict` helper
+- `e2e/tests/test_video.py` — exercises `run_detect_video` via the full API
+- `e2e/tests/test_single_image.py` — exercises `run_detect_image` via the full API