[AZ-178] Implement streaming video detection endpoint

- Added `/detect/video` endpoint for true streaming video detection, allowing inference to start as upload bytes arrive. - Introduced `run_detect_video_stream` method in the inference module to handle video processing from a file-like object. - Updated media hashing to include a new function for computing hashes directly from files with minimal I/O. - Enhanced documentation to reflect changes in video processing and API behavior. Made-with: Cursor
2026-04-22 09:36:32 +00:00 · 2026-04-01 03:11:43 +03:00
parent e65d8da6a3
commit be4cab4fcb
42 changed files with 2983 additions and 29 deletions
@@ -28,6 +28,8 @@ cdef class Inference:
                           annotation_callback, status_callback=None)
    cpdef run_detect_video(bytes video_bytes, AIRecognitionConfig ai_config, str media_name,
                           str save_path, annotation_callback, status_callback=None)
+    cpdef run_detect_video_stream(object readable, AIRecognitionConfig ai_config, str media_name,
+                                  annotation_callback, status_callback=None)
    cpdef stop()

    # Internal pipeline stages:
@@ -60,6 +62,7 @@ class LoaderHttpClient:

 ```
 def compute_media_content_hash(data: bytes, virtual: bool = False) -> str
+def compute_media_content_hash_from_file(path: str, virtual: bool = False) -> str
 ```

 ## External API
@@ -70,9 +73,10 @@ None — internal component, consumed by API layer.

 - Model bytes downloaded from Loader service (HTTP)
 - Converted TensorRT engines uploaded back to Loader for caching
- Video frames decoded from in-memory bytes via PyAV (`av.open(BytesIO)`)
+- Video frames decoded from in-memory bytes via PyAV (`av.open(BytesIO)`) — `run_detect_video`
+- Video frames decoded from streaming file-like via PyAV (`av.open(readable)`) — `run_detect_video_stream` (AZ-178)
 - Images decoded from in-memory bytes via `cv2.imdecode`
- Video bytes concurrently written to persistent storage path in background thread
+- Video bytes concurrently written to persistent storage path in background thread (`run_detect_video`) or via StreamingBuffer (`run_detect_video_stream`)
 - All inference processing is in-memory

 ## Implementation Details
@@ -119,6 +123,15 @@ None — internal component, consumed by API layer.
 - Annotation validity heuristics: time gap, detection count increase, spatial movement, confidence improvement
 - JPEG encoding of valid frames for annotation images

+### Streaming Video Processing (AZ-178)
+
+- `run_detect_video_stream` accepts a file-like `readable` (e.g. `StreamingBuffer`) instead of `bytes`
+- Opens `av.open(readable)` directly — PyAV calls `read()`/`seek()` on the object as needed
+- No writer thread — the `StreamingBuffer` already persists data to disk as the HTTP handler feeds it chunks
+- Reuses `_process_video_pyav` for all frame decoding, batching, and annotation logic
+- For faststart MP4/MKV/WebM: true streaming (~500ms to first frame)
+- For standard MP4 (moov at end): graceful degradation via blocking SEEK_END
+
 ### Callbacks

 - `annotation_callback(annotation, percent)` — called per valid annotation
@@ -14,6 +14,7 @@
 | Module | Role |
 |--------|------|
 | `main` | FastAPI app definition, endpoints, DTOs, TokenManager, SSE streaming, media lifecycle, DB-driven config resolution |
+| `streaming_buffer` | File-like object for concurrent write+read — enables true streaming video detection (AZ-178) |

 ## External API Specification

@@ -50,6 +51,13 @@
 **Behavior** (AZ-173, AZ-175): Accepts both images and videos. Detects upload kind by extension, falls back to content probing. If authenticated: computes content hash, persists to storage, creates media record, tracks status lifecycle (New → AI Processing → AI Processed / Error).
 **Errors**: 400 (empty/invalid image data), 422 (runtime error), 503 (engine unavailable).

+### POST /detect/video
+
+**Input**: Raw binary body (not multipart). Headers: `X-Filename` (e.g. `clip.mp4`), optional `X-Config` (JSON), optional `Authorization: Bearer {token}`, optional `X-Refresh-Token`.
+**Response**: `{"status": "started", "mediaId": "..."}`
+**Behavior** (AZ-178): True streaming video detection. Bypasses Starlette multipart buffering by accepting raw body via `request.stream()`. Creates a `StreamingBuffer` (temp file), starts inference thread immediately, feeds HTTP chunks to the buffer as they arrive. PyAV reads from the buffer concurrently, decoding frames and running inference. Detections broadcast via SSE in real-time during upload. After upload: computes content hash from file (3 KB I/O), renames to permanent path, creates media record if authenticated.
+**Errors**: 400 (non-video extension).
+
 ### POST /detect/{media_id}

 **Input**: Path param `media_id`, optional JSON body `AIConfigDto`, headers `Authorization: Bearer {token}`, `x-refresh-token: {token}`.
@@ -82,7 +90,8 @@ data: {"annotations": [...], "mediaId": "...", "mediaStatus": "AIProcessing", "m
 - `TokenManager.decode_user_id` extracts user identity from multiple JWT claim formats (sub, userId, nameid, SAML)
 - DB-driven config via `_resolve_media_for_detect`: fetches AI settings from Annotations, merges nested sections and casing variants
 - Media lifecycle: `_post_media_record` + `_put_media_status` manage status transitions via Annotations API
- Content hashing via `compute_media_content_hash` (XxHash64 with sampling) for media deduplication
+- Content hashing via `compute_media_content_hash` (bytes, XxHash64 with sampling) and `compute_media_content_hash_from_file` (file on disk, 3 KB I/O) for media deduplication
+- `StreamingBuffer` for `/detect/video`: concurrent file append + read via `threading.Condition`, enables PyAV to decode frames as HTTP chunks arrive

 ## Caveats

@@ -103,6 +112,7 @@ graph TD
    main --> constants_inf
    main --> loader_http_client
    main --> media_hash
+    main --> streaming_buffer
 ```

 ## Logging Strategy