mirror of
https://github.com/azaion/detections.git
synced 2026-04-22 10:56:32 +00:00
[AZ-172] Update documentation for distributed architecture, add Update Docs step to workflow
- Update module docs: main, inference, ai_config, loader_http_client - Add new module doc: media_hash - Update component docs: inference_pipeline, api - Update system-flows (F2, F3) and data_parameters - Add Task Mode to document skill for incremental doc updates - Insert Step 11 (Update Docs) in existing-code flow, renumber 11-13 to 12-14 Made-with: Cursor
This commit is contained in:
@@ -0,0 +1,50 @@
|
||||
# Module: media_hash
|
||||
|
||||
## Purpose
|
||||
|
||||
Content-based hashing for media files using XxHash64 with a deterministic sampling algorithm. Produces a stable, unique ID for any media file based on its content.
|
||||
|
||||
## Public Interface
|
||||
|
||||
| Function | Signature | Description |
|
||||
|----------|-----------|-------------|
|
||||
| `compute_media_content_hash` | `(data: bytes, virtual: bool = False) -> str` | Returns hex XxHash64 digest of sampled content. If `virtual=True`, prefixes with "V". |
|
||||
|
||||
## Internal Logic
|
||||
|
||||
### Sampling Algorithm (`_sampling_payload`)
|
||||
|
||||
- **Small files** (< 3072 bytes): uses entire content
|
||||
- **Large files** (≥ 3072 bytes): samples 3 × 1024-byte windows: first 1024, middle 1024, last 1024
|
||||
- All payloads are prefixed with the 8-byte little-endian file size for collision resistance
|
||||
|
||||
The sampling avoids reading the full file through the hash function while still providing high uniqueness — the head, middle, and tail capture format headers, content, and EOF markers.
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **External**: `xxhash` (pinned at 3.5.0 in requirements.txt)
|
||||
- **Internal**: none (leaf module)
|
||||
|
||||
## Consumers
|
||||
|
||||
- `main` — computes content hash for uploaded media in `POST /detect` to use as the media record ID and storage filename
|
||||
|
||||
## Data Models
|
||||
|
||||
None.
|
||||
|
||||
## Configuration
|
||||
|
||||
None.
|
||||
|
||||
## External Integrations
|
||||
|
||||
None.
|
||||
|
||||
## Security
|
||||
|
||||
None. The hash is non-cryptographic (fast, not tamper-resistant).
|
||||
|
||||
## Tests
|
||||
|
||||
- `tests/test_media_hash.py` — covers small files, large files, and virtual prefix behavior
|
||||
Reference in New Issue
Block a user