[AZ-178] Implement streaming video detection endpoint

- Added `/detect/video` endpoint for true streaming video detection, allowing inference to start as upload bytes arrive. - Introduced `run_detect_video_stream` method in the inference module to handle video processing from a file-like object. - Updated media hashing to include a new function for computing hashes directly from files with minimal I/O. - Enhanced documentation to reflect changes in video processing and API behavior. Made-with: Cursor
2026-06-21 11:51:07 +00:00 · 2026-04-01 03:11:43 +03:00
parent e65d8da6a3
commit be4cab4fcb
42 changed files with 2983 additions and 29 deletions
@@ -28,6 +28,8 @@ cdef class Inference:
                           annotation_callback, status_callback=None)
    cpdef run_detect_video(bytes video_bytes, AIRecognitionConfig ai_config, str media_name,
                           str save_path, annotation_callback, status_callback=None)
+    cpdef run_detect_video_stream(object readable, AIRecognitionConfig ai_config, str media_name,
+                                  annotation_callback, status_callback=None)
    cpdef stop()

    # Internal pipeline stages:
@@ -60,6 +62,7 @@ class LoaderHttpClient:

 ```
 def compute_media_content_hash(data: bytes, virtual: bool = False) -> str
+def compute_media_content_hash_from_file(path: str, virtual: bool = False) -> str
 ```

 ## External API
@@ -70,9 +73,10 @@ None — internal component, consumed by API layer.

 - Model bytes downloaded from Loader service (HTTP)
 - Converted TensorRT engines uploaded back to Loader for caching
- Video frames decoded from in-memory bytes via PyAV (`av.open(BytesIO)`)
+- Video frames decoded from in-memory bytes via PyAV (`av.open(BytesIO)`) — `run_detect_video`
+- Video frames decoded from streaming file-like via PyAV (`av.open(readable)`) — `run_detect_video_stream` (AZ-178)
 - Images decoded from in-memory bytes via `cv2.imdecode`
- Video bytes concurrently written to persistent storage path in background thread
+- Video bytes concurrently written to persistent storage path in background thread (`run_detect_video`) or via StreamingBuffer (`run_detect_video_stream`)
 - All inference processing is in-memory

 ## Implementation Details
@@ -119,6 +123,15 @@ None — internal component, consumed by API layer.
 - Annotation validity heuristics: time gap, detection count increase, spatial movement, confidence improvement
 - JPEG encoding of valid frames for annotation images

+### Streaming Video Processing (AZ-178)
+
+- `run_detect_video_stream` accepts a file-like `readable` (e.g. `StreamingBuffer`) instead of `bytes`
+- Opens `av.open(readable)` directly — PyAV calls `read()`/`seek()` on the object as needed
+- No writer thread — the `StreamingBuffer` already persists data to disk as the HTTP handler feeds it chunks
+- Reuses `_process_video_pyav` for all frame decoding, batching, and annotation logic
+- For faststart MP4/MKV/WebM: true streaming (~500ms to first frame)
+- For standard MP4 (moov at end): graceful degradation via blocking SEEK_END
+
 ### Callbacks

 - `annotation_callback(annotation, percent)` — called per valid annotation
@@ -14,6 +14,7 @@
 | Module | Role |
 |--------|------|
 | `main` | FastAPI app definition, endpoints, DTOs, TokenManager, SSE streaming, media lifecycle, DB-driven config resolution |
+| `streaming_buffer` | File-like object for concurrent write+read — enables true streaming video detection (AZ-178) |

 ## External API Specification

@@ -50,6 +51,13 @@
 **Behavior** (AZ-173, AZ-175): Accepts both images and videos. Detects upload kind by extension, falls back to content probing. If authenticated: computes content hash, persists to storage, creates media record, tracks status lifecycle (New → AI Processing → AI Processed / Error).
 **Errors**: 400 (empty/invalid image data), 422 (runtime error), 503 (engine unavailable).

+### POST /detect/video
+
+**Input**: Raw binary body (not multipart). Headers: `X-Filename` (e.g. `clip.mp4`), optional `X-Config` (JSON), optional `Authorization: Bearer {token}`, optional `X-Refresh-Token`.
+**Response**: `{"status": "started", "mediaId": "..."}`
+**Behavior** (AZ-178): True streaming video detection. Bypasses Starlette multipart buffering by accepting raw body via `request.stream()`. Creates a `StreamingBuffer` (temp file), starts inference thread immediately, feeds HTTP chunks to the buffer as they arrive. PyAV reads from the buffer concurrently, decoding frames and running inference. Detections broadcast via SSE in real-time during upload. After upload: computes content hash from file (3 KB I/O), renames to permanent path, creates media record if authenticated.
+**Errors**: 400 (non-video extension).
+
 ### POST /detect/{media_id}

 **Input**: Path param `media_id`, optional JSON body `AIConfigDto`, headers `Authorization: Bearer {token}`, `x-refresh-token: {token}`.
@@ -82,7 +90,8 @@ data: {"annotations": [...], "mediaId": "...", "mediaStatus": "AIProcessing", "m
 - `TokenManager.decode_user_id` extracts user identity from multiple JWT claim formats (sub, userId, nameid, SAML)
 - DB-driven config via `_resolve_media_for_detect`: fetches AI settings from Annotations, merges nested sections and casing variants
 - Media lifecycle: `_post_media_record` + `_put_media_status` manage status transitions via Annotations API
- Content hashing via `compute_media_content_hash` (XxHash64 with sampling) for media deduplication
+- Content hashing via `compute_media_content_hash` (bytes, XxHash64 with sampling) and `compute_media_content_hash_from_file` (file on disk, 3 KB I/O) for media deduplication
+- `StreamingBuffer` for `/detect/video`: concurrent file append + read via `threading.Condition`, enables PyAV to decode frames as HTTP chunks arrive

 ## Caveats

@@ -103,6 +112,7 @@ graph TD
    main --> constants_inf
    main --> loader_http_client
    main --> media_hash
+    main --> streaming_buffer
 ```

 ## Logging Strategy
@@ -39,6 +39,7 @@ Core inference orchestrator — manages the AI engine lifecycle, preprocesses me
 | `__init__` | `(loader_client)` | public | Initializes state, calls `init_ai()` |
 | `run_detect_image` | `(bytes image_bytes, AIRecognitionConfig ai_config, str media_name, annotation_callback, status_callback=None)` | cpdef | Decodes image from bytes, runs tiling + inference + postprocessing |
 | `run_detect_video` | `(bytes video_bytes, AIRecognitionConfig ai_config, str media_name, str save_path, annotation_callback, status_callback=None)` | cpdef | Processes video from in-memory bytes via PyAV, concurrently writes to save_path |
+| `run_detect_video_stream` | `(object readable, AIRecognitionConfig ai_config, str media_name, annotation_callback, status_callback=None)` | cpdef | Processes video from a file-like readable (e.g. StreamingBuffer) via PyAV — true streaming, no bytes in RAM (AZ-178) |
 | `stop` | `()` | cpdef | Sets stop_signal to True |
 | `init_ai` | `()` | cdef | Engine initialization: tries TensorRT → falls back to ONNX → background TensorRT conversion |
 | `preprocess` | `(frames) -> ndarray` | via engine | OpenCV blobFromImage: resize, normalize to 0..1, swap RGB, stack batch |
@@ -75,6 +76,15 @@ Both `run_detect_image` and `run_detect_video` accept raw bytes instead of file
 6. Annotation validity heuristics (time gap, detection count increase, spatial movement, confidence improvement)
 7. Valid frames get JPEG-encoded image attached

+### Streaming Video Processing (`run_detect_video_stream` — AZ-178)
+
+1. Accepts a file-like `readable` object (e.g. `StreamingBuffer`) instead of `bytes`
+2. Opens directly via `av.open(readable)` — PyAV calls `read()`/`seek()` on the object
+3. No writer thread needed — the caller (API layer) manages disk persistence via the same buffer
+4. Reuses `_process_video_pyav` for frame decoding, batch inference, and annotation delivery
+5. For faststart MP4/MKV/WebM: frames are decoded as bytes stream in (~500ms latency)
+6. For standard MP4 (moov at end): PyAV's `seek(0, 2)` blocks until the buffer signals EOF, then decoding starts
+
 ### Ground Sampling Distance (GSD)

 `GSD = sensor_width * altitude / (focal_length * image_width)` — meters per pixel, used for physical size filtering of aerial detections.
@@ -86,7 +96,7 @@ Both `run_detect_image` and `run_detect_video` accept raw bytes instead of file

 ## Consumers

- `main` — lazy-initializes Inference, calls `run_detect_image`/`run_detect_video`, reads `ai_availability_status` and `is_engine_ready`
+- `main` — lazy-initializes Inference, calls `run_detect_image`/`run_detect_video`/`run_detect_video_stream`, reads `ai_availability_status` and `is_engine_ready`

 ## Data Models

@@ -107,5 +117,6 @@ None.
 ## Tests

 - `tests/test_ai_config_from_dict.py` — tests `ai_config_from_dict` helper
+- `tests/test_az178_streaming_video.py` — tests `run_detect_video_stream` via the `/detect/video` endpoint and `StreamingBuffer`
 - `e2e/tests/test_video.py` — exercises `run_detect_video` via the full API
 - `e2e/tests/test_single_image.py` — exercises `run_detect_image` via the full API
@@ -11,7 +11,8 @@ FastAPI application entry point — exposes HTTP API for object detection on ima
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/health` | Returns AI engine availability status |
-| POST | `/detect` | Image/video detection with media lifecycle management |
+| POST | `/detect` | Image/video detection with media lifecycle management (buffered) |
+| POST | `/detect/video` | Streaming video detection — inference starts as upload bytes arrive (AZ-178) |
 | POST | `/detect/{media_id}` | Start async detection on media resolved from Annotations service |
 | GET | `/detect/stream` | SSE stream of detection events |

@@ -41,7 +42,8 @@ FastAPI application entry point — exposes HTTP API for object detection on ima
 | `_detect_upload_kind` | `(filename, data) -> tuple[str, str]` | Determines if upload is image or video by extension, falls back to content probing (cv2/PyAV) |
 | `_post_media_record` | `(payload, bearer) -> bool` | Creates media record via `POST /api/media` on Annotations service |
 | `_put_media_status` | `(media_id, status, bearer) -> bool` | Updates media status via `PUT /api/media/{media_id}/status` on Annotations service |
-| `compute_media_content_hash` | (imported from `media_hash`) | XxHash64 content hash with sampling |
+| `compute_media_content_hash` | (imported from `media_hash`) | XxHash64 content hash with sampling (from bytes) |
+| `compute_media_content_hash_from_file` | (imported from `media_hash`) | XxHash64 content hash from file on disk — reads only 3 KB |

 ## Internal Logic

@@ -57,9 +59,10 @@ Returns `HealthResponse` with `status="healthy"` always. `aiAvailability` reflec
 4. Parses optional JSON config
 5. Extracts auth tokens; if authenticated:
   a. Computes XxHash64 content hash
-   b. Persists file to `VIDEOS_DIR` or `IMAGES_DIR`
-   c. Creates media record via `POST /api/media`
-   d. Sets status to `AI_PROCESSING` via `PUT /api/media/{id}/status`
+   b. For images: persists file to `IMAGES_DIR` synchronously (since `run_detect_image` does not write to disk)
+   c. For videos: file path is prepared but writing is deferred to `run_detect_video` which writes concurrently with frame detection (AZ-177)
+   d. Creates media record via `POST /api/media`
+   e. Sets status to `AI_PROCESSING` via `PUT /api/media/{id}/status`
 6. Runs `run_detect_image` or `run_detect_video` in ThreadPoolExecutor
 7. On success: sets status to `AI_PROCESSED`
 8. On failure: sets status to `ERROR`
@@ -80,6 +83,20 @@ Returns `HealthResponse` with `status="healthy"` always. `aiAvailability` reflec
 8. Updates media status via `PUT /api/media/{id}/status`
 9. Returns immediately: `{"status": "started", "mediaId": media_id}`

+### `/detect/video` (streaming upload — AZ-178)
+
+1. Parses `X-Filename`, `X-Config`, auth headers (no multipart — raw binary body)
+2. Validates video extension
+3. Creates `StreamingBuffer` backed by a temp file in `VIDEOS_DIR`
+4. Starts inference thread via `run_in_executor`: `run_detect_video_stream(buffer, ...)`
+5. Reads HTTP body chunks via `request.stream()`, feeds each to `buffer.append()` via executor
+6. Inference thread reads from the same buffer concurrently — PyAV decodes frames as data arrives
+7. Detections are broadcast to SSE queues in real-time during upload
+8. After upload completes: signals EOF, computes content hash from temp file (3 KB read), renames to permanent path
+9. If authenticated: creates media record, tracks status lifecycle
+10. Returns `{"status": "started", "mediaId": "..."}` — inference continues in background task
+11. Background task awaits inference completion, updates status to AI_PROCESSED or Error
+
 ### `/detect/stream` (SSE)

 - Creates asyncio.Queue per client (maxsize=100)
@@ -99,7 +116,7 @@ Detections posts results to `POST {ANNOTATIONS_URL}/annotations` during async me
 ## Dependencies

 - **External**: `asyncio`, `base64`, `io`, `json`, `os`, `tempfile`, `time`, `concurrent.futures`, `pathlib`, `typing`, `av`, `cv2`, `numpy`, `requests`, `fastapi`, `pydantic`
- **Internal**: `inference` (lazy import), `constants_inf` (label lookup), `loader_http_client` (client instantiation), `media_hash` (content hashing)
+- **Internal**: `inference` (lazy import), `constants_inf` (label lookup), `loader_http_client` (client instantiation), `media_hash` (content hashing), `streaming_buffer` (streaming video upload)

 ## Consumers

@@ -141,4 +158,5 @@ None (entry point).

 - `tests/test_az174_db_driven_config.py` — `decode_user_id`, `_merged_annotation_settings_payload`, `_resolve_media_for_detect`
 - `tests/test_az175_api_calls.py` — `_post_media_record`, `_put_media_status`
+- `tests/test_az177_video_single_write.py` — video single-write, image unchanged, concurrent writer thread, temp cleanup
 - `e2e/tests/test_*.py` — full API e2e tests (health, single image, video, async, SSE, negative, security, performance, resilience)
@@ -9,6 +9,7 @@ Content-based hashing for media files using XxHash64 with a deterministic sampli
 | Function | Signature | Description |
 |----------|-----------|-------------|
 | `compute_media_content_hash` | `(data: bytes, virtual: bool = False) -> str` | Returns hex XxHash64 digest of sampled content. If `virtual=True`, prefixes with "V". |
+| `compute_media_content_hash_from_file` | `(path: str, virtual: bool = False) -> str` | Same algorithm but reads sampling regions directly from a file on disk — only 3 KB I/O regardless of file size. Produces identical hashes to the bytes-based version. (AZ-178) |

 ## Internal Logic

@@ -27,7 +28,7 @@ The sampling avoids reading the full file through the hash function while still

 ## Consumers

- `main` — computes content hash for uploaded media in `POST /detect` to use as the media record ID and storage filename
+- `main` — computes content hash for uploaded media in `POST /detect` (bytes version) and `POST /detect/video` (file version) to use as the media record ID and storage filename

 ## Data Models

@@ -48,3 +49,4 @@ None. The hash is non-cryptographic (fast, not tamper-resistant).
 ## Tests

 - `tests/test_media_hash.py` — covers small files, large files, and virtual prefix behavior
+- `tests/test_az178_streaming_video.py::TestMediaContentHashFromFile` — verifies file-based hash matches bytes-based hash for small, large, boundary, and virtual cases
@@ -0,0 +1,82 @@
+# Module: streaming_buffer
+
+## Purpose
+
+File-like object backed by a temp file that supports concurrent append (write) and read+seek (read) from separate threads. Designed for true streaming video detection: the HTTP handler appends incoming chunks while the inference thread reads and decodes frames via PyAV — simultaneously, without buffering the entire file in memory.
+
+## Public Interface
+
+### Class: StreamingBuffer
+
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `__init__` | `(temp_dir: str \| None = None)` | Creates a temp file in `temp_dir`; opens separate write and read handles |
+| `append` | `(data: bytes) -> None` | Writes data to temp file, flushes, notifies waiting readers |
+| `close_writer` | `() -> None` | Signals EOF — wakes all blocked readers |
+| `read` | `(size: int = -1) -> bytes` | Reads up to `size` bytes; blocks if data not yet available; returns `b""` on EOF |
+| `seek` | `(offset: int, whence: int = 0) -> int` | Seeks reader position; SEEK_END blocks until EOF is signaled |
+| `tell` | `() -> int` | Returns current reader position |
+| `readable` | `() -> bool` | Always returns `True` |
+| `seekable` | `() -> bool` | Always returns `True` |
+| `writable` | `() -> bool` | Always returns `False` |
+| `close` | `() -> None` | Closes both file handles |
+
+### Properties
+
+| Property | Type | Description |
+|----------|------|-------------|
+| `path` | `str` | Absolute path to the backing temp file |
+| `written` | `int` | Total bytes appended so far |
+
+## Internal Logic
+
+### Thread Coordination
+
+Uses `threading.Condition` to synchronize one writer (HTTP handler) and one reader (PyAV/inference thread):
+
+- **append()**: acquires lock → writes to file → flushes → increments `_written` → `notify_all()` → releases lock
+- **read(size)**: acquires lock → checks if data available → if not and not EOF, calls `wait()` (releases lock, sleeps) → woken by `notify_all()` → calculates bytes to read → releases lock → reads from file (outside lock)
+- **seek(0, 2)** (SEEK_END): acquires lock → if EOF not signaled, calls `wait()` in loop → once EOF, delegates to `_reader.seek(offset, 2)`
+
+The file read itself happens **outside** the lock to avoid holding the lock during I/O.
+
+### File Handle Separation
+
+Two independent file descriptors on the same temp file:
+- `_writer` opened with `"wb"` — append-only, used by the HTTP handler
+- `_reader` opened with `"rb"` — seekable, used by PyAV
+
+On POSIX systems, writes flushed by one fd are immediately visible to reads on another fd of the same inode (shared kernel page cache). `os.rename()` on the path while the reader fd is open is safe — the fd retains access to the underlying inode.
+
+### SEEK_END Behavior
+
+When PyAV tries to seek to the end of the file (e.g. to find MP4 moov atom), `seek(0, 2)` blocks until `close_writer()` is called. This provides graceful degradation for non-faststart MP4 files: the decoder waits for the full upload, then processes normally. For faststart MP4/MKV/WebM, SEEK_END is never called and frames are decoded immediately.
+
+## Dependencies
+
+- **External**: `os`, `tempfile`, `threading`
+- **Internal**: none (leaf module)
+
+## Consumers
+
+- `main` — creates `StreamingBuffer` in `POST /detect/video`, feeds chunks via `append()`, passes buffer to inference
+
+## Data Models
+
+None.
+
+## Configuration
+
+None.
+
+## External Integrations
+
+None.
+
+## Security
+
+None. Temp file permissions follow OS defaults (`tempfile.mkstemp`).
+
+## Tests
+
+- `tests/test_az178_streaming_video.py::TestStreamingBuffer` — sequential write/read, blocking read, EOF, concurrent chunked read/write, seek set, seek end blocking, tell, file persistence, written property, seekable/readable flags
@@ -10,6 +10,7 @@
 | F4 | SSE Event Streaming | Client GET /detect/stream | API | Medium |
 | F5 | Engine Initialization | First detection request | Inference Pipeline, Engines, Loader | High |
 | F6 | TensorRT Background Conversion | No pre-built TensorRT engine | Inference Pipeline, Engines, Loader | Medium |
+| F7 | Streaming Video Detection | Client POST /detect/video | API, StreamingBuffer, Inference Pipeline, Engines, Domain, Annotations | High |

 ## Flow Dependencies

@@ -18,9 +19,10 @@
 | F1 | F5 (for meaningful status) | — |
 | F2 | F5 (engine must be ready) | Annotations (media lifecycle) |
 | F3 | F5 (engine must be ready) | F4 (via SSE event queues), Annotations (settings, media lifecycle) |
-| F4 | — | F3 (receives events) |
+| F4 | — | F3, F7 (receives events) |
 | F5 | — | F6 (triggers conversion if needed) |
 | F6 | F5 (triggered by init failure) | F5 (provides converted bytes) |
+| F7 | F5 (engine must be ready) | F4 (via SSE event queues), Annotations (media lifecycle) |

 ---

@@ -317,3 +319,255 @@ sequenceDiagram
    INF->>STATUS: set_status(ENABLED)
    Note over INF: Next init_ai() call will load from _converted_model_bytes
 ```
+
+---
+
+## Flow F7: Streaming Video Detection (AZ-178)
+
+### Description
+
+Client uploads a video file as raw binary and gets near-real-time detections via SSE as frames are decoded — **during** the upload, not after. The endpoint bypasses FastAPI's multipart buffering entirely, using `request.stream()` to read the HTTP body chunk-by-chunk. Each chunk is simultaneously written to a temp file (via `StreamingBuffer`) and read by PyAV in a background inference thread. First detections appear within ~500ms of the first decodable frames arriving at the API. Peak memory usage is bounded by the model batch size × frame size (tens of MB), regardless of video file size.
+
+### Activity Diagram — Full Data Pipeline
+
+```mermaid
+flowchart TD
+    subgraph CLIENT ["Client (Browser)"]
+        C1([Open SSE connection<br/>GET /detect/stream])
+        C2([Start upload<br/>POST /detect/video])
+        C3([Receive SSE events<br/>during upload])
+    end
+
+    subgraph API ["API Layer — main.py (async event loop)"]
+        A1[Parse headers:<br/>X-Filename, X-Config, Auth]
+        A2{Valid video<br/>extension?}
+        A3[Create StreamingBuffer<br/>backed by temp file]
+        A4[Start inference thread<br/>via run_in_executor]
+        A5["Read chunk from<br/>request.stream()"]
+        A6[buffer.append chunk<br/>via run_in_executor]
+        A7{More chunks?}
+        A8[buffer.close_writer<br/>signal EOF]
+        A9[Compute content hash<br/>from temp file on disk<br/>reads only 3 KB]
+        A10[Rename temp file →<br/>permanent storage path]
+        A11[Create media record<br/>POST /api/media]
+        A12["Return {status: started,<br/>mediaId: hash}"]
+        A13[Register background task<br/>to await inference completion]
+    end
+
+    subgraph BUF ["StreamingBuffer — streaming_buffer.py"]
+        B1[/"Temp file on disk<br/>(single file, two handles)"/]
+        B2["append(data):<br/>write + flush + notify"]
+        B3["read(size):<br/>block if ahead of writer<br/>return available bytes"]
+        B4["seek(offset, whence):<br/>SEEK_END blocks until EOF"]
+        B5["close_writer():<br/>set EOF flag, notify all"]
+    end
+
+    subgraph INF ["Inference Thread — inference.pyx"]
+        I1["av.open(buffer)<br/>PyAV reads via buffer.read()"]
+        I2{Moov at start?}
+        I3[Decode frames immediately<br/>~500ms latency]
+        I4["Blocks on seek(0, 2)<br/>until upload completes"]
+        I5["Decode batch of frames<br/>(frame_period_recognition sampling)"]
+        I6["engine.process_frames(batch)"]
+        I7{Detections found?}
+        I8["on_annotation callback<br/>→ SSE event broadcast"]
+        I9{More frames?}
+        I10[send_detection_status]
+    end
+
+    C2 --> A1
+    A1 --> A2
+    A2 -->|No| ERR([400 Bad Request])
+    A2 -->|Yes| A3
+    A3 --> A4
+    A4 --> A5
+
+    A5 --> A6
+    A6 --> B2
+    B2 --> B1
+    A6 --> A7
+    A7 -->|Yes| A5
+    A7 -->|No| A8
+    A8 --> B5
+
+    A8 --> A9
+    A9 --> A10
+    A10 --> A11
+    A11 --> A12
+    A12 --> A13
+
+    A4 -.->|background thread| I1
+    I1 --> I2
+    I2 -->|"Yes (faststart MP4,<br/>MKV, WebM)"| I3
+    I2 -->|"No (standard MP4)"| I4
+    I4 --> I3
+    I3 --> I5
+    I5 --> I6
+    I6 --> I7
+    I7 -->|Yes| I8
+    I8 --> C3
+    I7 -->|No| I9
+    I8 --> I9
+    I9 -->|Yes| I5
+    I9 -->|No| I10
+
+    B3 -.->|"PyAV calls<br/>read()"| I1
+
+    style BUF fill:#e8f4fd,stroke:#2196F3
+    style INF fill:#fce4ec,stroke:#e91e63
+    style API fill:#e8f5e9,stroke:#4CAF50
+    style CLIENT fill:#fff3e0,stroke:#FF9800
+```
+
+### Sequence Diagram — Concurrent Timeline
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant SSE as SSE /detect/stream
+    participant API as main.py (async)
+    participant BUF as StreamingBuffer
+    participant INF as Inference Thread
+    participant PyAV
+    participant ENG as Engine (ONNX/TRT)
+    participant ANN as Annotations Service
+
+    Client->>SSE: GET /detect/stream (open)
+    Client->>API: POST /detect/video (raw body, streaming)
+    API->>API: Parse X-Filename, X-Config, Auth headers
+    API->>BUF: Create StreamingBuffer (temp file)
+    API->>INF: Start in executor thread
+
+    par Upload stream (async event loop) and Inference (background thread)
+        loop Each HTTP body chunk (~8-64 KB)
+            API->>BUF: append(chunk) → write + flush + notify
+        end
+
+        INF->>PyAV: av.open(buffer)
+        Note over PyAV,BUF: PyAV calls buffer.read().<br/>Blocks when no data yet.<br/>Resumes as chunks arrive.
+
+        loop Each decodable frame batch
+            PyAV->>BUF: read(size) → returns available bytes
+            BUF-->>PyAV: video data
+            PyAV-->>INF: decoded frames (BGR numpy)
+            INF->>ENG: process_frames(batch)
+            ENG-->>INF: detections
+            opt Valid detections
+                INF->>SSE: DetectionEvent (via callback)
+                SSE-->>Client: data: {...detections...}
+            end
+        end
+    end
+
+    API->>BUF: close_writer() → EOF signal
+    Note over INF: PyAV reads remaining frames, finishes
+
+    API->>API: compute_media_content_hash_from_file(temp file) — reads 3 KB
+    API->>API: Rename temp file → {hash}{ext}
+
+    opt Authenticated user
+        API->>ANN: POST /api/media (create record)
+        API->>ANN: PUT /api/media/{id}/status (AI_PROCESSING)
+    end
+
+    API-->>Client: {"status": "started", "mediaId": "abc123"}
+
+    Note over API: Background task awaits inference completion
+
+    INF-->>API: Inference completes
+    opt Authenticated user
+        API->>ANN: PUT /api/media/{id}/status (AI_PROCESSED)
+    end
+    API->>SSE: DetectionEvent(status=AIProcessed, percent=100)
+    SSE-->>Client: data: {...status: AIProcessed...}
+```
+
+### Flowchart — StreamingBuffer Read/Write Coordination
+
+```mermaid
+flowchart TD
+    subgraph WRITER ["Writer (HTTP handler thread)"]
+        W1["Receive HTTP chunk"]
+        W2["Acquire Condition lock"]
+        W3["file.write(chunk) + flush()"]
+        W4["_written += len(chunk)"]
+        W5["notify_all() → wake reader"]
+        W6["Release lock"]
+        W7{More chunks?}
+        W8["close_writer():<br/>set _eof = True<br/>notify_all()"]
+    end
+
+    subgraph READER ["Reader (PyAV / Inference thread)"]
+        R1["PyAV calls read(size)"]
+        R2["Acquire Condition lock"]
+        R3{"_written > pos?"}
+        R4["cond.wait()<br/>(releases lock, sleeps)"]
+        R5["Calculate to_read =<br/>min(size, available)"]
+        R6["Release lock"]
+        R7["file.read(to_read)<br/>(outside lock)"]
+        R8["Return bytes to PyAV"]
+        R9{"_eof and<br/>available == 0?"}
+        R10["Return b'' (EOF)"]
+    end
+
+    W1 --> W2 --> W3 --> W4 --> W5 --> W6 --> W7
+    W7 -->|Yes| W1
+    W7 -->|No| W8
+
+    R1 --> R2 --> R3
+    R3 -->|Yes| R5
+    R3 -->|No| R9
+    R9 -->|Yes| R10
+    R9 -->|No| R4
+    R4 -.->|"Woken by<br/>notify_all()"| R3
+    R5 --> R6 --> R7 --> R8
+
+    style WRITER fill:#e8f5e9,stroke:#4CAF50
+    style READER fill:#fce4ec,stroke:#e91e63
+```
+
+### Data Flow
+
+| Step | From | To | Data | Format |
+|------|------|----|------|--------|
+| 1 | Client | API | Raw video bytes (streaming) | HTTP POST body chunks |
+| 2 | API | StreamingBuffer | Byte chunks (8-64 KB each) | `append(bytes)` |
+| 3 | StreamingBuffer | Temp file | Same chunks | `file.write()` + `flush()` |
+| 4 | StreamingBuffer | PyAV (Inference thread) | Byte segments on demand | `read(size)` blocks when ahead |
+| 5 | PyAV | Inference | Decoded BGR numpy frames | ndarray |
+| 6 | Inference | Engine | Preprocessed batch | ndarray |
+| 7 | Engine | Inference | Raw detections | ndarray |
+| 8 | Inference | SSE clients | DetectionEvent | SSE JSON via `loop.call_soon_threadsafe` |
+| 9 | API | Temp file | Content hash (3 KB read) | `compute_media_content_hash_from_file` |
+| 10 | API | Disk | Rename temp → permanent path | `os.rename` |
+| 11 | API | Annotations Service | Media record + status | HTTP POST/PUT JSON |
+
+### Memory Profile (2 GB video)
+
+| Stage | Current (F2) | Streaming (F7) |
+|-------|-------------|----------------|
+| Starlette buffering | 2 GB (SpooledTempFile) | 0 (raw stream) |
+| `file.read()` / chunk buffer | 2 GB (full bytes) | ~64 KB (one chunk) |
+| BytesIO for PyAV | 2 GB (copy) | 0 (reads from buffer) |
+| Writer thread | 2 GB (same ref) | 0 (no separate writer) |
+| **Peak process RAM** | **~4+ GB** | **~50 MB** (batch × frame) |
+
+### Format Compatibility
+
+| Container Format | Moov Location | Streaming Behavior |
+|-----------------|--------------|-------------------|
+| MP4 (faststart) | Beginning | True streaming — first frame decoded in ~500ms |
+| MKV / WebM | Beginning | True streaming — first frame decoded in ~500ms |
+| MP4 (standard) | End of file | Graceful degradation — `seek(0, 2)` blocks until upload completes, then decoding starts |
+| MOV, AVI | Varies | Depends on header location |
+
+### Error Scenarios
+
+| Error | Where | Detection | Recovery |
+|-------|-------|-----------|----------|
+| Non-video extension | API | Extension check | 400 Bad Request |
+| Client disconnects mid-upload | request.stream() | Exception | buffer.close_writer() called in except, inference thread gets EOF |
+| Engine unavailable | Inference thread | engine is None | Error event via SSE |
+| PyAV decode failure | Inference thread | Exception | Error event via SSE, media status set to Error |
+| Disk full | StreamingBuffer.append | OSError | Propagated to API handler |
+| Annotations service down | _post_media_record | Exception caught | Silently continues, detections still work |