[AZ-178] Implement streaming video detection endpoint

- Added `/detect/video` endpoint for true streaming video detection, allowing inference to start as upload bytes arrive. - Introduced `run_detect_video_stream` method in the inference module to handle video processing from a file-like object. - Updated media hashing to include a new function for computing hashes directly from files with minimal I/O. - Enhanced documentation to reflect changes in video processing and API behavior. Made-with: Cursor
2026-06-21 12:01:09 +00:00 · 2026-04-01 03:11:43 +03:00
parent e65d8da6a3
commit be4cab4fcb
42 changed files with 2983 additions and 29 deletions
@@ -10,6 +10,7 @@
 | F4 | SSE Event Streaming | Client GET /detect/stream | API | Medium |
 | F5 | Engine Initialization | First detection request | Inference Pipeline, Engines, Loader | High |
 | F6 | TensorRT Background Conversion | No pre-built TensorRT engine | Inference Pipeline, Engines, Loader | Medium |
+| F7 | Streaming Video Detection | Client POST /detect/video | API, StreamingBuffer, Inference Pipeline, Engines, Domain, Annotations | High |

 ## Flow Dependencies

@@ -18,9 +19,10 @@
 | F1 | F5 (for meaningful status) | — |
 | F2 | F5 (engine must be ready) | Annotations (media lifecycle) |
 | F3 | F5 (engine must be ready) | F4 (via SSE event queues), Annotations (settings, media lifecycle) |
-| F4 | — | F3 (receives events) |
+| F4 | — | F3, F7 (receives events) |
 | F5 | — | F6 (triggers conversion if needed) |
 | F6 | F5 (triggered by init failure) | F5 (provides converted bytes) |
+| F7 | F5 (engine must be ready) | F4 (via SSE event queues), Annotations (media lifecycle) |

 ---

@@ -317,3 +319,255 @@ sequenceDiagram
    INF->>STATUS: set_status(ENABLED)
    Note over INF: Next init_ai() call will load from _converted_model_bytes
 ```
+
+---
+
+## Flow F7: Streaming Video Detection (AZ-178)
+
+### Description
+
+Client uploads a video file as raw binary and gets near-real-time detections via SSE as frames are decoded — **during** the upload, not after. The endpoint bypasses FastAPI's multipart buffering entirely, using `request.stream()` to read the HTTP body chunk-by-chunk. Each chunk is simultaneously written to a temp file (via `StreamingBuffer`) and read by PyAV in a background inference thread. First detections appear within ~500ms of the first decodable frames arriving at the API. Peak memory usage is bounded by the model batch size × frame size (tens of MB), regardless of video file size.
+
+### Activity Diagram — Full Data Pipeline
+
+```mermaid
+flowchart TD
+    subgraph CLIENT ["Client (Browser)"]
+        C1([Open SSE connection<br/>GET /detect/stream])
+        C2([Start upload<br/>POST /detect/video])
+        C3([Receive SSE events<br/>during upload])
+    end
+
+    subgraph API ["API Layer — main.py (async event loop)"]
+        A1[Parse headers:<br/>X-Filename, X-Config, Auth]
+        A2{Valid video<br/>extension?}
+        A3[Create StreamingBuffer<br/>backed by temp file]
+        A4[Start inference thread<br/>via run_in_executor]
+        A5["Read chunk from<br/>request.stream()"]
+        A6[buffer.append chunk<br/>via run_in_executor]
+        A7{More chunks?}
+        A8[buffer.close_writer<br/>signal EOF]
+        A9[Compute content hash<br/>from temp file on disk<br/>reads only 3 KB]
+        A10[Rename temp file →<br/>permanent storage path]
+        A11[Create media record<br/>POST /api/media]
+        A12["Return {status: started,<br/>mediaId: hash}"]
+        A13[Register background task<br/>to await inference completion]
+    end
+
+    subgraph BUF ["StreamingBuffer — streaming_buffer.py"]
+        B1[/"Temp file on disk<br/>(single file, two handles)"/]
+        B2["append(data):<br/>write + flush + notify"]
+        B3["read(size):<br/>block if ahead of writer<br/>return available bytes"]
+        B4["seek(offset, whence):<br/>SEEK_END blocks until EOF"]
+        B5["close_writer():<br/>set EOF flag, notify all"]
+    end
+
+    subgraph INF ["Inference Thread — inference.pyx"]
+        I1["av.open(buffer)<br/>PyAV reads via buffer.read()"]
+        I2{Moov at start?}
+        I3[Decode frames immediately<br/>~500ms latency]
+        I4["Blocks on seek(0, 2)<br/>until upload completes"]
+        I5["Decode batch of frames<br/>(frame_period_recognition sampling)"]
+        I6["engine.process_frames(batch)"]
+        I7{Detections found?}
+        I8["on_annotation callback<br/>→ SSE event broadcast"]
+        I9{More frames?}
+        I10[send_detection_status]
+    end
+
+    C2 --> A1
+    A1 --> A2
+    A2 -->|No| ERR([400 Bad Request])
+    A2 -->|Yes| A3
+    A3 --> A4
+    A4 --> A5
+
+    A5 --> A6
+    A6 --> B2
+    B2 --> B1
+    A6 --> A7
+    A7 -->|Yes| A5
+    A7 -->|No| A8
+    A8 --> B5
+
+    A8 --> A9
+    A9 --> A10
+    A10 --> A11
+    A11 --> A12
+    A12 --> A13
+
+    A4 -.->|background thread| I1
+    I1 --> I2
+    I2 -->|"Yes (faststart MP4,<br/>MKV, WebM)"| I3
+    I2 -->|"No (standard MP4)"| I4
+    I4 --> I3
+    I3 --> I5
+    I5 --> I6
+    I6 --> I7
+    I7 -->|Yes| I8
+    I8 --> C3
+    I7 -->|No| I9
+    I8 --> I9
+    I9 -->|Yes| I5
+    I9 -->|No| I10
+
+    B3 -.->|"PyAV calls<br/>read()"| I1
+
+    style BUF fill:#e8f4fd,stroke:#2196F3
+    style INF fill:#fce4ec,stroke:#e91e63
+    style API fill:#e8f5e9,stroke:#4CAF50
+    style CLIENT fill:#fff3e0,stroke:#FF9800
+```
+
+### Sequence Diagram — Concurrent Timeline
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant SSE as SSE /detect/stream
+    participant API as main.py (async)
+    participant BUF as StreamingBuffer
+    participant INF as Inference Thread
+    participant PyAV
+    participant ENG as Engine (ONNX/TRT)
+    participant ANN as Annotations Service
+
+    Client->>SSE: GET /detect/stream (open)
+    Client->>API: POST /detect/video (raw body, streaming)
+    API->>API: Parse X-Filename, X-Config, Auth headers
+    API->>BUF: Create StreamingBuffer (temp file)
+    API->>INF: Start in executor thread
+
+    par Upload stream (async event loop) and Inference (background thread)
+        loop Each HTTP body chunk (~8-64 KB)
+            API->>BUF: append(chunk) → write + flush + notify
+        end
+
+        INF->>PyAV: av.open(buffer)
+        Note over PyAV,BUF: PyAV calls buffer.read().<br/>Blocks when no data yet.<br/>Resumes as chunks arrive.
+
+        loop Each decodable frame batch
+            PyAV->>BUF: read(size) → returns available bytes
+            BUF-->>PyAV: video data
+            PyAV-->>INF: decoded frames (BGR numpy)
+            INF->>ENG: process_frames(batch)
+            ENG-->>INF: detections
+            opt Valid detections
+                INF->>SSE: DetectionEvent (via callback)
+                SSE-->>Client: data: {...detections...}
+            end
+        end
+    end
+
+    API->>BUF: close_writer() → EOF signal
+    Note over INF: PyAV reads remaining frames, finishes
+
+    API->>API: compute_media_content_hash_from_file(temp file) — reads 3 KB
+    API->>API: Rename temp file → {hash}{ext}
+
+    opt Authenticated user
+        API->>ANN: POST /api/media (create record)
+        API->>ANN: PUT /api/media/{id}/status (AI_PROCESSING)
+    end
+
+    API-->>Client: {"status": "started", "mediaId": "abc123"}
+
+    Note over API: Background task awaits inference completion
+
+    INF-->>API: Inference completes
+    opt Authenticated user
+        API->>ANN: PUT /api/media/{id}/status (AI_PROCESSED)
+    end
+    API->>SSE: DetectionEvent(status=AIProcessed, percent=100)
+    SSE-->>Client: data: {...status: AIProcessed...}
+```
+
+### Flowchart — StreamingBuffer Read/Write Coordination
+
+```mermaid
+flowchart TD
+    subgraph WRITER ["Writer (HTTP handler thread)"]
+        W1["Receive HTTP chunk"]
+        W2["Acquire Condition lock"]
+        W3["file.write(chunk) + flush()"]
+        W4["_written += len(chunk)"]
+        W5["notify_all() → wake reader"]
+        W6["Release lock"]
+        W7{More chunks?}
+        W8["close_writer():<br/>set _eof = True<br/>notify_all()"]
+    end
+
+    subgraph READER ["Reader (PyAV / Inference thread)"]
+        R1["PyAV calls read(size)"]
+        R2["Acquire Condition lock"]
+        R3{"_written > pos?"}
+        R4["cond.wait()<br/>(releases lock, sleeps)"]
+        R5["Calculate to_read =<br/>min(size, available)"]
+        R6["Release lock"]
+        R7["file.read(to_read)<br/>(outside lock)"]
+        R8["Return bytes to PyAV"]
+        R9{"_eof and<br/>available == 0?"}
+        R10["Return b'' (EOF)"]
+    end
+
+    W1 --> W2 --> W3 --> W4 --> W5 --> W6 --> W7
+    W7 -->|Yes| W1
+    W7 -->|No| W8
+
+    R1 --> R2 --> R3
+    R3 -->|Yes| R5
+    R3 -->|No| R9
+    R9 -->|Yes| R10
+    R9 -->|No| R4
+    R4 -.->|"Woken by<br/>notify_all()"| R3
+    R5 --> R6 --> R7 --> R8
+
+    style WRITER fill:#e8f5e9,stroke:#4CAF50
+    style READER fill:#fce4ec,stroke:#e91e63
+```
+
+### Data Flow
+
+| Step | From | To | Data | Format |
+|------|------|----|------|--------|
+| 1 | Client | API | Raw video bytes (streaming) | HTTP POST body chunks |
+| 2 | API | StreamingBuffer | Byte chunks (8-64 KB each) | `append(bytes)` |
+| 3 | StreamingBuffer | Temp file | Same chunks | `file.write()` + `flush()` |
+| 4 | StreamingBuffer | PyAV (Inference thread) | Byte segments on demand | `read(size)` blocks when ahead |
+| 5 | PyAV | Inference | Decoded BGR numpy frames | ndarray |
+| 6 | Inference | Engine | Preprocessed batch | ndarray |
+| 7 | Engine | Inference | Raw detections | ndarray |
+| 8 | Inference | SSE clients | DetectionEvent | SSE JSON via `loop.call_soon_threadsafe` |
+| 9 | API | Temp file | Content hash (3 KB read) | `compute_media_content_hash_from_file` |
+| 10 | API | Disk | Rename temp → permanent path | `os.rename` |
+| 11 | API | Annotations Service | Media record + status | HTTP POST/PUT JSON |
+
+### Memory Profile (2 GB video)
+
+| Stage | Current (F2) | Streaming (F7) |
+|-------|-------------|----------------|
+| Starlette buffering | 2 GB (SpooledTempFile) | 0 (raw stream) |
+| `file.read()` / chunk buffer | 2 GB (full bytes) | ~64 KB (one chunk) |
+| BytesIO for PyAV | 2 GB (copy) | 0 (reads from buffer) |
+| Writer thread | 2 GB (same ref) | 0 (no separate writer) |
+| **Peak process RAM** | **~4+ GB** | **~50 MB** (batch × frame) |
+
+### Format Compatibility
+
+| Container Format | Moov Location | Streaming Behavior |
+|-----------------|--------------|-------------------|
+| MP4 (faststart) | Beginning | True streaming — first frame decoded in ~500ms |
+| MKV / WebM | Beginning | True streaming — first frame decoded in ~500ms |
+| MP4 (standard) | End of file | Graceful degradation — `seek(0, 2)` blocks until upload completes, then decoding starts |
+| MOV, AVI | Varies | Depends on header location |
+
+### Error Scenarios
+
+| Error | Where | Detection | Recovery |
+|-------|-------|-----------|----------|
+| Non-video extension | API | Extension check | 400 Bad Request |
+| Client disconnects mid-upload | request.stream() | Exception | buffer.close_writer() called in except, inference thread gets EOF |
+| Engine unavailable | Inference thread | engine is None | Error event via SSE |
+| PyAV decode failure | Inference thread | Exception | Error event via SSE, media status set to Error |
+| Disk full | StreamingBuffer.append | OSError | Propagated to API handler |
+| Annotations service down | _post_media_record | Exception caught | Silently continues, detections still work |