mirror of https://github.com/azaion/detections.git synced 2026-04-23 02:16:31 +00:00

Files

T

Oleksandr Bezdieniezhnykh 1fe9425aa8 [AZ-172] Update documentation for distributed architecture, add Update Docs step to workflow

- Update module docs: main, inference, ai_config, loader_http_client
- Add new module doc: media_hash
- Update component docs: inference_pipeline, api
- Update system-flows (F2, F3) and data_parameters
- Add Task Mode to document skill for incremental doc updates
- Insert Step 11 (Update Docs) in existing-code flow, renumber 11-13 to 12-14

Made-with: Cursor

2026-03-31 17:25:58 +03:00

1.5 KiB

Raw Blame History

Module: media_hash

Purpose

Content-based hashing for media files using XxHash64 with a deterministic sampling algorithm. Produces a stable, unique ID for any media file based on its content.

Public Interface

Function	Signature	Description
`compute_media_content_hash`	`(data: bytes, virtual: bool = False) -> str`	Returns hex XxHash64 digest of sampled content. If `virtual=True`, prefixes with "V".

Internal Logic

Sampling Algorithm (`_sampling_payload`)

Small files (< 3072 bytes): uses entire content
Large files (≥ 3072 bytes): samples 3 × 1024-byte windows: first 1024, middle 1024, last 1024
All payloads are prefixed with the 8-byte little-endian file size for collision resistance

The sampling avoids reading the full file through the hash function while still providing high uniqueness — the head, middle, and tail capture format headers, content, and EOF markers.

Dependencies

External: xxhash (pinned at 3.5.0 in requirements.txt)
Internal: none (leaf module)

Consumers

main — computes content hash for uploaded media in POST /detect to use as the media record ID and storage filename

Data Models

None.

Configuration

None.

External Integrations

None.

Security

None. The hash is non-cryptographic (fast, not tamper-resistant).

Tests

tests/test_media_hash.py — covers small files, large files, and virtual prefix behavior

1.5 KiB Raw Blame History Unescape Escape

Module: media_hash

Purpose

Public Interface

Internal Logic

Sampling Algorithm (_sampling_payload)

Dependencies

Consumers

Data Models

Configuration

External Integrations

Security

Tests

1.5 KiB

Raw Blame History

Sampling Algorithm (`_sampling_payload`)