mirror of
https://github.com/azaion/detections.git
synced 2026-04-22 21:46:31 +00:00
be4cab4fcb
- Added `/detect/video` endpoint for true streaming video detection, allowing inference to start as upload bytes arrive. - Introduced `run_detect_video_stream` method in the inference module to handle video processing from a file-like object. - Updated media hashing to include a new function for computing hashes directly from files with minimal I/O. - Enhanced documentation to reflect changes in video processing and API behavior. Made-with: Cursor
53 lines
1.9 KiB
Markdown
53 lines
1.9 KiB
Markdown
# Module: media_hash
|
||
|
||
## Purpose
|
||
|
||
Content-based hashing for media files using XxHash64 with a deterministic sampling algorithm. Produces a stable, unique ID for any media file based on its content.
|
||
|
||
## Public Interface
|
||
|
||
| Function | Signature | Description |
|
||
|----------|-----------|-------------|
|
||
| `compute_media_content_hash` | `(data: bytes, virtual: bool = False) -> str` | Returns hex XxHash64 digest of sampled content. If `virtual=True`, prefixes with "V". |
|
||
| `compute_media_content_hash_from_file` | `(path: str, virtual: bool = False) -> str` | Same algorithm but reads sampling regions directly from a file on disk — only 3 KB I/O regardless of file size. Produces identical hashes to the bytes-based version. (AZ-178) |
|
||
|
||
## Internal Logic
|
||
|
||
### Sampling Algorithm (`_sampling_payload`)
|
||
|
||
- **Small files** (< 3072 bytes): uses entire content
|
||
- **Large files** (≥ 3072 bytes): samples 3 × 1024-byte windows: first 1024, middle 1024, last 1024
|
||
- All payloads are prefixed with the 8-byte little-endian file size for collision resistance
|
||
|
||
The sampling avoids reading the full file through the hash function while still providing high uniqueness — the head, middle, and tail capture format headers, content, and EOF markers.
|
||
|
||
## Dependencies
|
||
|
||
- **External**: `xxhash` (pinned at 3.5.0 in requirements.txt)
|
||
- **Internal**: none (leaf module)
|
||
|
||
## Consumers
|
||
|
||
- `main` — computes content hash for uploaded media in `POST /detect` (bytes version) and `POST /detect/video` (file version) to use as the media record ID and storage filename
|
||
|
||
## Data Models
|
||
|
||
None.
|
||
|
||
## Configuration
|
||
|
||
None.
|
||
|
||
## External Integrations
|
||
|
||
None.
|
||
|
||
## Security
|
||
|
||
None. The hash is non-cryptographic (fast, not tamper-resistant).
|
||
|
||
## Tests
|
||
|
||
- `tests/test_media_hash.py` — covers small files, large files, and virtual prefix behavior
|
||
- `tests/test_az178_streaming_video.py::TestMediaContentHashFromFile` — verifies file-based hash matches bytes-based hash for small, large, boundary, and virtual cases
|