[AZ-178] Implement streaming video detection endpoint

- Added `/detect/video` endpoint for true streaming video detection, allowing inference to start as upload bytes arrive.
- Introduced `run_detect_video_stream` method in the inference module to handle video processing from a file-like object.
- Updated media hashing to include a new function for computing hashes directly from files with minimal I/O.
- Enhanced documentation to reflect changes in video processing and API behavior.

Made-with: Cursor
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-04-01 03:11:43 +03:00
parent e65d8da6a3
commit be4cab4fcb
42 changed files with 2983 additions and 29 deletions
+3 -1
View File
@@ -9,6 +9,7 @@ Content-based hashing for media files using XxHash64 with a deterministic sampli
| Function | Signature | Description |
|----------|-----------|-------------|
| `compute_media_content_hash` | `(data: bytes, virtual: bool = False) -> str` | Returns hex XxHash64 digest of sampled content. If `virtual=True`, prefixes with "V". |
| `compute_media_content_hash_from_file` | `(path: str, virtual: bool = False) -> str` | Same algorithm but reads sampling regions directly from a file on disk — only 3 KB I/O regardless of file size. Produces identical hashes to the bytes-based version. (AZ-178) |
## Internal Logic
@@ -27,7 +28,7 @@ The sampling avoids reading the full file through the hash function while still
## Consumers
- `main` — computes content hash for uploaded media in `POST /detect` to use as the media record ID and storage filename
- `main` — computes content hash for uploaded media in `POST /detect` (bytes version) and `POST /detect/video` (file version) to use as the media record ID and storage filename
## Data Models
@@ -48,3 +49,4 @@ None. The hash is non-cryptographic (fast, not tamper-resistant).
## Tests
- `tests/test_media_hash.py` — covers small files, large files, and virtual prefix behavior
- `tests/test_az178_streaming_video.py::TestMediaContentHashFromFile` — verifies file-based hash matches bytes-based hash for small, large, boundary, and virtual cases