mirror of
https://github.com/azaion/detections.git
synced 2026-04-23 02:16:31 +00:00
1fe9425aa8
- Update module docs: main, inference, ai_config, loader_http_client - Add new module doc: media_hash - Update component docs: inference_pipeline, api - Update system-flows (F2, F3) and data_parameters - Add Task Mode to document skill for incremental doc updates - Insert Step 11 (Update Docs) in existing-code flow, renumber 11-13 to 12-14 Made-with: Cursor
1.5 KiB
1.5 KiB
Module: media_hash
Purpose
Content-based hashing for media files using XxHash64 with a deterministic sampling algorithm. Produces a stable, unique ID for any media file based on its content.
Public Interface
| Function | Signature | Description |
|---|---|---|
compute_media_content_hash |
(data: bytes, virtual: bool = False) -> str |
Returns hex XxHash64 digest of sampled content. If virtual=True, prefixes with "V". |
Internal Logic
Sampling Algorithm (_sampling_payload)
- Small files (< 3072 bytes): uses entire content
- Large files (≥ 3072 bytes): samples 3 × 1024-byte windows: first 1024, middle 1024, last 1024
- All payloads are prefixed with the 8-byte little-endian file size for collision resistance
The sampling avoids reading the full file through the hash function while still providing high uniqueness — the head, middle, and tail capture format headers, content, and EOF markers.
Dependencies
- External:
xxhash(pinned at 3.5.0 in requirements.txt) - Internal: none (leaf module)
Consumers
main— computes content hash for uploaded media inPOST /detectto use as the media record ID and storage filename
Data Models
None.
Configuration
None.
External Integrations
None.
Security
None. The hash is non-cryptographic (fast, not tamper-resistant).
Tests
tests/test_media_hash.py— covers small files, large files, and virtual prefix behavior