[AZ-172] Update documentation for distributed architecture, add Update Docs step to workflow

- Update module docs: main, inference, ai_config, loader_http_client - Add new module doc: media_hash - Update component docs: inference_pipeline, api - Update system-flows (F2, F3) and data_parameters - Add Task Mode to document skill for incremental doc updates - Insert Step 11 (Update Docs) in existing-code flow, renumber 11-13 to 12-14 Made-with: Cursor
2026-06-21 09:41:09 +00:00 · 2026-03-31 17:25:58 +03:00
parent e29606c313
commit 1fe9425aa8
12 changed files with 447 additions and 245 deletions
@@ -2,19 +2,20 @@

 ## Overview

-**Purpose**: Orchestrates the full inference lifecycle — engine initialization with fallback strategy, media preprocessing (images + video), batched inference execution, postprocessing with detection filtering, and result delivery via callbacks.
+**Purpose**: Orchestrates the full inference lifecycle — engine initialization with fallback strategy, stream-based media preprocessing (images + video from bytes), batched inference execution, postprocessing with detection filtering, and result delivery via callbacks.

 **Pattern**: Façade + Pipeline — `Inference` class is the single entry point that coordinates engine selection, preprocessing, inference, and postprocessing stages.

 **Upstream**: Domain (data models, config, status), Inference Engines (OnnxEngine/TensorRTEngine), External Client (LoaderHttpClient).
-**Downstream**: API (creates Inference, calls `run_detect` and `detect_single_image`).
+**Downstream**: API (creates Inference, calls `run_detect_image` and `run_detect_video`).

 ## Modules

 | Module | Role |
 |--------|------|
-| `inference` | Core orchestrator: engine lifecycle, preprocessing, postprocessing, image/video processing |
-| `loader_http_client` | HTTP client for model download/upload from Loader service |
+| `inference` | Core orchestrator: engine lifecycle, stream-based image/video processing, postprocessing |
+| `loader_http_client` | HTTP client for model download/upload (Loader) and API queries (Annotations service) |
+| `media_hash` | XxHash64 content hashing with sampling algorithm for media identification |

 ## Internal Interfaces

@@ -23,25 +24,42 @@
 ```
 cdef class Inference:
    __init__(loader_client)
-    cpdef run_detect(dict config_dict, annotation_callback, status_callback=None)
-    cpdef list detect_single_image(bytes image_bytes, dict config_dict)
+    cpdef run_detect_image(bytes image_bytes, AIRecognitionConfig ai_config, str media_name,
+                           annotation_callback, status_callback=None)
+    cpdef run_detect_video(bytes video_bytes, AIRecognitionConfig ai_config, str media_name,
+                           str save_path, annotation_callback, status_callback=None)
    cpdef stop()

    # Internal pipeline stages:
    cdef init_ai()
-    cdef preprocess(frames) -> ndarray
-    cdef postprocess(output, ai_config) -> list[list[Detection]]
-    cdef remove_overlapping_detections(list[Detection], float threshold) -> list[Detection]
-    cdef _process_images(AIRecognitionConfig, list[str] paths)
-    cdef _process_video(AIRecognitionConfig, str video_name)
+    cdef _process_video_pyav(AIRecognitionConfig, str original_media_name, object container)
+    cdef _process_video_batch(AIRecognitionConfig, list frames, list timestamps, str name, int frame_count, int total, int model_w)
+    cdef _append_image_frame_entries(AIRecognitionConfig, list all_frame_data, frame, str original_media_name)
+    cdef _finalize_image_inference(AIRecognitionConfig, list all_frame_data)
+    cdef split_to_tiles(frame, str media_stem, tile_size, overlap_percent)
+    cdef remove_overlapping_detections(list[Detection], float threshold) -> list[Detection]  (delegated to engine)
+```
+
+### Free Functions
+
+```
+def ai_config_from_dict(dict data) -> AIRecognitionConfig
 ```

 ### LoaderHttpClient

 ```
 class LoaderHttpClient:
-    load_big_small_resource(str filename, str directory) -> LoadResult
-    upload_big_small_resource(bytes content, str filename, str directory) -> LoadResult
+    cdef load_big_small_resource(str filename, str directory) -> LoadResult
+    cdef upload_big_small_resource(bytes content, str filename, str directory) -> LoadResult
+    cpdef fetch_user_ai_settings(str user_id, str bearer_token) -> object
+    cpdef fetch_media_path(str media_id, str bearer_token) -> object
+```
+
+### media_hash
+
+```
+def compute_media_content_hash(data: bytes, virtual: bool = False) -> str
 ```

 ## External API
@@ -52,9 +70,10 @@ None — internal component, consumed by API layer.

 - Model bytes downloaded from Loader service (HTTP)
 - Converted TensorRT engines uploaded back to Loader for caching
- Video frames read via OpenCV VideoCapture
- Images read via OpenCV imread
- All processing is in-memory
+- Video frames decoded from in-memory bytes via PyAV (`av.open(BytesIO)`)
+- Images decoded from in-memory bytes via `cv2.imdecode`
+- Video bytes concurrently written to persistent storage path in background thread
+- All inference processing is in-memory

 ## Implementation Details

@@ -92,8 +111,10 @@ None — internal component, consumed by API layer.
 - Tile deduplication: absolute-coordinate Detection equality across adjacent tiles
 - Physical size filtering: remove detections exceeding class max_object_size_meters

-### Video Processing
+### Video Processing (PyAV-based — AZ-173)

+- Reads video from in-memory `BytesIO` via `av.open` (no filesystem read needed)
+- Concurrently writes bytes to `save_path` for persistent storage
 - Frame sampling: every Nth frame
 - Annotation validity heuristics: time gap, detection count increase, spatial movement, confidence improvement
 - JPEG encoding of valid frames for annotation images
@@ -107,9 +128,9 @@ None — internal component, consumed by API layer.

 - `ThreadPoolExecutor` with max_workers=2 limits concurrent inference (set in main.py)
 - Background TensorRT conversion runs in a daemon thread — may be interrupted on shutdown
- `init_ai()` called on every `run_detect` — idempotent but checks engine state each time
+- `init_ai()` called on every detection entry point — idempotent but checks engine state each time
 - Video processing is sequential per video (no parallel video processing)
- `_tile_detections` dict is instance-level state that persists across image calls within a single `run_detect` invocation
+- `_tile_detections` dict is instance-level state that persists across image calls within a single `run_detect_image` invocation

 ## Dependency Graph

@@ -122,6 +143,7 @@ graph TD
    inference -.-> onnx_engine
    inference -.-> tensorrt_engine
    inference --> loader_http_client
+    main --> media_hash
 ```

 ## Logging Strategy