[AZ-172] Update documentation for distributed architecture, add Update Docs step to workflow

- Update module docs: main, inference, ai_config, loader_http_client - Add new module doc: media_hash - Update component docs: inference_pipeline, api - Update system-flows (F2, F3) and data_parameters - Add Task Mode to document skill for incremental doc updates - Insert Step 11 (Update Docs) in existing-code flow, renumber 11-13 to 12-14 Made-with: Cursor
2026-06-21 04:31:07 +00:00 · 2026-03-31 17:25:58 +03:00
parent e29606c313
commit 1fe9425aa8
12 changed files with 447 additions and 245 deletions
@@ -16,11 +16,12 @@ Workflow for projects with an existing codebase. Starts with documentation, prod
 | 8 | New Task | new-task/SKILL.md | Steps 1–8 (loop) |
 | 9 | Implement | implement/SKILL.md | (batch-driven, no fixed sub-steps) |
 | 10 | Run Tests | test-run/SKILL.md | Steps 1–4 |
-| 11 | Security Audit | security/SKILL.md | Phase 1–5 (optional) |
-| 12 | Performance Test | (autopilot-managed) | Load/stress tests (optional) |
-| 13 | Deploy | deploy/SKILL.md | Step 1–7 |
+| 11 | Update Docs | document/SKILL.md (task mode) | Task Steps 0–5 |
+| 12 | Security Audit | security/SKILL.md | Phase 1–5 (optional) |
+| 13 | Performance Test | (autopilot-managed) | Load/stress tests (optional) |
+| 14 | Deploy | deploy/SKILL.md | Step 1–7 |

-After Step 13, the existing-code workflow is complete.
+After Step 14, the existing-code workflow is complete.

 ## Detection Rules

@@ -157,8 +158,24 @@ Action: Read and execute `.cursor/skills/test-run/SKILL.md`

 ---

-**Step 11 — Security Audit (optional)**
-Condition: the autopilot state shows Step 10 (Run Tests) is completed AND the autopilot state does NOT show Step 11 (Security Audit) as completed or skipped AND (`_docs/04_deploy/` does not exist or is incomplete)
+**Step 11 — Update Docs**
+Condition: the autopilot state shows Step 10 (Run Tests) is completed AND the autopilot state does NOT show Step 11 (Update Docs) as completed AND `_docs/02_document/` contains existing documentation (module or component docs)
+
+Action: Read and execute `.cursor/skills/document/SKILL.md` in **Task mode**. Pass all task spec files from `_docs/02_tasks/done/` that were implemented in the current cycle (i.e., tasks moved to `done/` during Steps 8–9 of this cycle).
+
+The document skill in Task mode:
+1. Reads each task spec to identify changed source files
+2. Updates affected module docs, component docs, and system-level docs
+3. Does NOT redo full discovery, verification, or problem extraction
+
+If `_docs/02_document/` does not contain existing docs (e.g., documentation step was skipped), mark Step 11 as `skipped`.
+
+After completion, auto-chain to Step 12 (Security Audit).
+
+---
+
+**Step 12 — Security Audit (optional)**
+Condition: the autopilot state shows Step 11 (Update Docs) is completed or skipped AND the autopilot state does NOT show Step 12 (Security Audit) as completed or skipped AND (`_docs/04_deploy/` does not exist or is incomplete)

 Action: Present using Choose format:

@@ -173,13 +190,13 @@ Action: Present using Choose format:
 ══════════════════════════════════════
 ```

- If user picks A → Read and execute `.cursor/skills/security/SKILL.md`. After completion, auto-chain to Step 12 (Performance Test).
- If user picks B → Mark Step 11 as `skipped` in the state file, auto-chain to Step 12 (Performance Test).
+- If user picks A → Read and execute `.cursor/skills/security/SKILL.md`. After completion, auto-chain to Step 13 (Performance Test).
+- If user picks B → Mark Step 12 as `skipped` in the state file, auto-chain to Step 13 (Performance Test).

 ---

-**Step 12 — Performance Test (optional)**
-Condition: the autopilot state shows Step 11 (Security Audit) is completed or skipped AND the autopilot state does NOT show Step 12 (Performance Test) as completed or skipped AND (`_docs/04_deploy/` does not exist or is incomplete)
+**Step 13 — Performance Test (optional)**
+Condition: the autopilot state shows Step 12 (Security Audit) is completed or skipped AND the autopilot state does NOT show Step 13 (Performance Test) as completed or skipped AND (`_docs/04_deploy/` does not exist or is incomplete)

 Action: Present using Choose format:

@@ -200,13 +217,13 @@ Action: Present using Choose format:
  2. Otherwise, check if `_docs/02_document/tests/performance-tests.md` exists for test scenarios, detect appropriate load testing tool (k6, locust, artillery, wrk, or built-in benchmarks), and execute performance test scenarios against the running system
  3. Present results vs acceptance criteria thresholds
  4. If thresholds fail → present Choose format: A) Fix and re-run, B) Proceed anyway, C) Abort
-  5. After completion, auto-chain to Step 13 (Deploy)
- If user picks B → Mark Step 12 as `skipped` in the state file, auto-chain to Step 13 (Deploy).
+  5. After completion, auto-chain to Step 14 (Deploy)
+- If user picks B → Mark Step 13 as `skipped` in the state file, auto-chain to Step 14 (Deploy).

 ---

-**Step 13 — Deploy**
-Condition: the autopilot state shows Step 10 (Run Tests) is completed AND (Step 11 is completed or skipped) AND (Step 12 is completed or skipped) AND (`_docs/04_deploy/` does not exist or is incomplete)
+**Step 14 — Deploy**
+Condition: the autopilot state shows Step 10 (Run Tests) is completed AND (Step 11 is completed or skipped) AND (Step 12 is completed or skipped) AND (Step 13 is completed or skipped) AND (`_docs/04_deploy/` does not exist or is incomplete)

 Action: Read and execute `.cursor/skills/deploy/SKILL.md`

@@ -215,7 +232,7 @@ After deployment completes, the existing-code workflow is done.
 ---

 **Re-Entry After Completion**
-Condition: the autopilot state shows `step: done` OR all steps through 13 (Deploy) are completed
+Condition: the autopilot state shows `step: done` OR all steps through 14 (Deploy) are completed

 Action: The project completed a full cycle. Print the status banner and automatically loop back to New Task — do NOT ask the user for confirmation:

@@ -230,6 +247,8 @@ Action: The project completed a full cycle. Print the status banner and automati

 Set `step: 8`, `status: not_started` in the state file, then auto-chain to Step 8 (New Task).

+Note: the loop (Steps 8 → 14 → 8) ensures every feature cycle includes: New Task → Implement → Run Tests → Update Docs → Security → Performance → Deploy.
+
 ## Auto-Chain Rules

 | Completed Step | Next Action |
@@ -243,10 +262,11 @@ Set `step: 8`, `status: not_started` in the state file, then auto-chain to Step
 | Refactor (7, done or skipped) | Auto-chain → New Task (8) |
 | New Task (8) | **Session boundary** — suggest new conversation before Implement |
 | Implement (9) | Auto-chain → Run Tests (10) |
-| Run Tests (10, all pass) | Auto-chain → Security Audit choice (11) |
-| Security Audit (11, done or skipped) | Auto-chain → Performance Test choice (12) |
-| Performance Test (12, done or skipped) | Auto-chain → Deploy (13) |
-| Deploy (13) | **Workflow complete** — existing-code flow done |
+| Run Tests (10, all pass) | Auto-chain → Update Docs (11) |
+| Update Docs (11) | Auto-chain → Security Audit choice (12) |
+| Security Audit (12, done or skipped) | Auto-chain → Performance Test choice (13) |
+| Performance Test (13, done or skipped) | Auto-chain → Deploy (14) |
+| Deploy (14) | **Workflow complete** — existing-code flow done |

 ## Status Summary Template

@@ -264,9 +284,10 @@ Set `step: 8`, `status: not_started` in the state file, then auto-chain to Step
 Step 8   New Task                 [DONE (N tasks) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
 Step 9   Implement                [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED / FAILED (retry N/3)]
 Step 10  Run Tests                [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
- Step 11  Security Audit           [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
- Step 12  Performance Test         [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
- Step 13  Deploy                   [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 11  Update Docs              [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 12  Security Audit           [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 13  Performance Test         [DONE / SKIPPED / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
+ Step 14  Deploy                   [DONE / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
 ═══════════════════════════════════════════════════
 Current: Step N — Name
 SubStep: M — [sub-skill internal step name]
@@ -51,9 +51,12 @@ Determine the execution mode before any other logic:
 | **Full** | No input file, no existing state | Entire codebase |
 | **Focus Area** | User provides a directory path (e.g., `@src/api/`) | Only the specified subtree + transitive dependencies |
 | **Resume** | `state.json` exists in DOCUMENT_DIR | Continue from last checkpoint |
+| **Task** | User provides a task spec file (e.g., `@_docs/02_tasks/done/AZ-173_*.md`) AND `_docs/02_document/` has existing docs | Targeted update of docs affected by the task |

 Focus Area mode produces module + component docs for the targeted area only. It can be run repeatedly for different areas — each run appends to the existing module and component docs without overwriting other areas.

+Task mode is a lightweight, incremental update — it reads the task spec to identify which files changed, then updates only the affected module docs, component docs, system flows, and data parameters. It does NOT redo discovery, verification, or problem extraction. See **Task Mode Workflow** below.
+
 ## Prerequisite Checks

 1. If `_docs/` already exists and contains files AND mode is **Full**, ASK user: **overwrite, merge, or write to `_docs_generated/` instead?**
@@ -387,6 +390,88 @@ Using `.cursor/skills/plan/templates/final-report.md` as structure:

 ---

+## Task Mode Workflow
+
+Triggered when a task spec file is provided as input AND `_docs/02_document/` already contains module/component docs.
+
+**Accepts**: one or more task spec files (from `_docs/02_tasks/todo/` or `_docs/02_tasks/done/`).
+
+### Task Step 0: Scope Analysis
+
+1. Read each task spec — extract the "Files Modified" or "Scope / Included" section to identify which source files were changed
+2. Map changed source files to existing module docs in `DOCUMENT_DIR/modules/`
+3. Map affected modules to their parent components in `DOCUMENT_DIR/components/`
+4. Identify which higher-level docs might be affected (system-flows, data_model, data_parameters)
+
+**Output**: a list of docs to update, organized by level:
+- Module docs (direct matches)
+- Component docs (parents of affected modules)
+- System-level docs (only if the task changed API endpoints, data models, or external integrations)
+- Problem-level docs (only if the task changed input parameters, acceptance criteria, or restrictions)
+
+### Task Step 1: Module Doc Updates
+
+For each affected module:
+
+1. Read the current source file
+2. Read the existing module doc
+3. Diff the module doc against current code — identify:
+   - New functions/methods/classes not in the doc
+   - Removed functions/methods/classes still in the doc
+   - Changed signatures or behavior
+   - New/removed dependencies
+   - New/removed external integrations
+4. Update the module doc in-place, preserving the existing structure and style
+5. If a module is entirely new (no existing doc), create a new module doc following the standard template
+
+### Task Step 2: Component Doc Updates
+
+For each affected component:
+
+1. Read all module docs belonging to this component (including freshly updated ones)
+2. Read the existing component doc
+3. Update internal interfaces, dependency graphs, implementation details, and caveats sections
+4. Do NOT change the component's purpose, pattern, or high-level overview unless the task fundamentally changed it
+
+### Task Step 3: System-Level Doc Updates (conditional)
+
+Only if the task changed API endpoints, system flows, data models, or external integrations:
+
+1. Update `system-flows.md` — modify affected flow diagrams and data flow tables
+2. Update `data_model.md` — if entities changed
+3. Update `architecture.md` — only if new external integrations or architectural patterns were added
+
+### Task Step 4: Problem-Level Doc Updates (conditional)
+
+Only if the task changed API input parameters, configuration, or acceptance criteria:
+
+1. Update `_docs/00_problem/input_data/data_parameters.md`
+2. Update `_docs/00_problem/acceptance_criteria.md` — if new testable criteria emerged
+
+### Task Step 5: Summary
+
+Present a summary of all docs updated:
+
+```
+══════════════════════════════════════
+ DOCUMENTATION UPDATE COMPLETE
+══════════════════════════════════════
+ Task(s): [task IDs]
+ Module docs updated: [count]
+ Component docs updated: [count]
+ System-level docs updated: [list or "none"]
+ Problem-level docs updated: [list or "none"]
+══════════════════════════════════════
+```
+
+### Task Mode Principles
+
+- **Minimal changes**: only update what the task actually changed. Do not rewrite unaffected sections.
+- **Preserve style**: match the existing doc's structure, tone, and level of detail.
+- **Verify against code**: for every entity added or changed in a doc, confirm it exists in the current source.
+- **New modules**: if the task introduced an entirely new source file, create a new module doc from the standard template.
+- **Dead references**: if the task removed code, remove the corresponding doc entries. Do not keep stale references.
+
 ## Artifact Management

 ### Directory Structure
@@ -2,23 +2,27 @@

 ## Media Input

-### Single Image Detection (POST /detect)
+### Upload Detection (POST /detect)

 | Parameter | Type | Source | Description |
 |-----------|------|--------|-------------|
-| file | bytes (multipart) | Client upload | Image file (JPEG, PNG, etc. — any format OpenCV can decode) |
-| config | JSON string (optional) | Query/form field | AIConfigDto overrides |
+| file | bytes (multipart) | Client upload | Image or video file (JPEG, PNG, MP4, MOV, etc.) |
+| config | JSON string (optional) | Form field | AIConfigDto overrides |
+| Authorization header | Bearer token (optional) | HTTP header | JWT for media lifecycle management |
+| x-refresh-token header | string (optional) | HTTP header | Refresh token for JWT renewal |
+
+When auth headers are present, the service: computes an XxHash64 content hash, persists the file to `VIDEOS_DIR`/`IMAGES_DIR`, creates a media record via Annotations API, and tracks processing status.

 ### Media Detection (POST /detect/{media_id})

 | Parameter | Type | Source | Description |
 |-----------|------|--------|-------------|
-| media_id | string | URL path | Identifier for media in the Loader service |
-| AIConfigDto body | JSON (optional) | Request body | Configuration overrides |
+| media_id | string | URL path | Identifier for media in the Annotations service |
+| AIConfigDto body | JSON (optional) | Request body | Configuration overrides (merged with DB settings) |
 | Authorization header | Bearer token | HTTP header | JWT for Annotations service |
 | x-refresh-token header | string | HTTP header | Refresh token for JWT renewal |

-Media files (images and videos) are resolved by the Inference pipeline via paths in the config. The Loader service provides model files, not media files directly.
+Media path is resolved from the Annotations service via `GET /api/media/{media_id}`. AI settings are fetched from `GET /api/users/{user_id}/ai-settings` and merged with client overrides.

 ## Configuration Input (AIConfigDto / AIRecognitionConfig)

@@ -30,12 +34,13 @@ Media files (images and videos) are resolved by the Inference pipeline via paths
 | tracking_distance_confidence | float | 0.0 | Movement threshold for tracking (model-width fraction) |
 | tracking_probability_increase | float | 0.0 | Confidence increase threshold for tracking |
 | tracking_intersection_threshold | float | 0.6 | Overlap ratio for NMS deduplication |
-| model_batch_size | int | 1 | Inference batch size |
+| model_batch_size | int | 8 | Inference batch size |
 | big_image_tile_overlap_percent | int | 20 | Tile overlap for large images (0-100%) |
 | altitude | float | 400 | Camera altitude in meters |
 | focal_length | float | 24 | Camera focal length in mm |
 | sensor_width | float | 23.5 | Camera sensor width in mm |
-| paths | list[str] | [] | Media file paths to process |
+
+`paths` field was removed in AZ-174 — media paths are now resolved via the Annotations service.

 ## Model Files

@@ -61,7 +66,7 @@ Array of 19 objects, each with:
 ## Data Volumes

 - Single image: up to tens of megapixels (aerial imagery). Large images are tiled.
- Video: processed frame-by-frame with configurable sampling rate.
+- Video: processed frame-by-frame with configurable sampling rate. Decoded from in-memory bytes via PyAV.
 - Model file: ONNX model size depends on architecture (typically 10-100 MB). TensorRT engines are GPU-specific compiled versions.
 - Detection output: up to 300 detections per frame (model limit).

@@ -73,4 +78,4 @@ Array of 19 objects, each with:
 | API responses | JSON | Pydantic model_dump |
 | SSE events | text/event-stream | JSON per event |
 | Internal config | Python dict | AIRecognitionConfig.from_dict() |
-| Legacy (unused) | msgpack | serialize() / from_msgpack() |
+| Content hash | XxHash64 hex string | 16-char hex digest |
@@ -2,19 +2,20 @@

 ## Overview

-**Purpose**: Orchestrates the full inference lifecycle — engine initialization with fallback strategy, media preprocessing (images + video), batched inference execution, postprocessing with detection filtering, and result delivery via callbacks.
+**Purpose**: Orchestrates the full inference lifecycle — engine initialization with fallback strategy, stream-based media preprocessing (images + video from bytes), batched inference execution, postprocessing with detection filtering, and result delivery via callbacks.

 **Pattern**: Façade + Pipeline — `Inference` class is the single entry point that coordinates engine selection, preprocessing, inference, and postprocessing stages.

 **Upstream**: Domain (data models, config, status), Inference Engines (OnnxEngine/TensorRTEngine), External Client (LoaderHttpClient).
-**Downstream**: API (creates Inference, calls `run_detect` and `detect_single_image`).
+**Downstream**: API (creates Inference, calls `run_detect_image` and `run_detect_video`).

 ## Modules

 | Module | Role |
 |--------|------|
-| `inference` | Core orchestrator: engine lifecycle, preprocessing, postprocessing, image/video processing |
-| `loader_http_client` | HTTP client for model download/upload from Loader service |
+| `inference` | Core orchestrator: engine lifecycle, stream-based image/video processing, postprocessing |
+| `loader_http_client` | HTTP client for model download/upload (Loader) and API queries (Annotations service) |
+| `media_hash` | XxHash64 content hashing with sampling algorithm for media identification |

 ## Internal Interfaces

@@ -23,25 +24,42 @@
 ```
 cdef class Inference:
    __init__(loader_client)
-    cpdef run_detect(dict config_dict, annotation_callback, status_callback=None)
-    cpdef list detect_single_image(bytes image_bytes, dict config_dict)
+    cpdef run_detect_image(bytes image_bytes, AIRecognitionConfig ai_config, str media_name,
+                           annotation_callback, status_callback=None)
+    cpdef run_detect_video(bytes video_bytes, AIRecognitionConfig ai_config, str media_name,
+                           str save_path, annotation_callback, status_callback=None)
    cpdef stop()

    # Internal pipeline stages:
    cdef init_ai()
-    cdef preprocess(frames) -> ndarray
-    cdef postprocess(output, ai_config) -> list[list[Detection]]
-    cdef remove_overlapping_detections(list[Detection], float threshold) -> list[Detection]
-    cdef _process_images(AIRecognitionConfig, list[str] paths)
-    cdef _process_video(AIRecognitionConfig, str video_name)
+    cdef _process_video_pyav(AIRecognitionConfig, str original_media_name, object container)
+    cdef _process_video_batch(AIRecognitionConfig, list frames, list timestamps, str name, int frame_count, int total, int model_w)
+    cdef _append_image_frame_entries(AIRecognitionConfig, list all_frame_data, frame, str original_media_name)
+    cdef _finalize_image_inference(AIRecognitionConfig, list all_frame_data)
+    cdef split_to_tiles(frame, str media_stem, tile_size, overlap_percent)
+    cdef remove_overlapping_detections(list[Detection], float threshold) -> list[Detection]  (delegated to engine)
+```
+
+### Free Functions
+
+```
+def ai_config_from_dict(dict data) -> AIRecognitionConfig
 ```

 ### LoaderHttpClient

 ```
 class LoaderHttpClient:
-    load_big_small_resource(str filename, str directory) -> LoadResult
-    upload_big_small_resource(bytes content, str filename, str directory) -> LoadResult
+    cdef load_big_small_resource(str filename, str directory) -> LoadResult
+    cdef upload_big_small_resource(bytes content, str filename, str directory) -> LoadResult
+    cpdef fetch_user_ai_settings(str user_id, str bearer_token) -> object
+    cpdef fetch_media_path(str media_id, str bearer_token) -> object
+```
+
+### media_hash
+
+```
+def compute_media_content_hash(data: bytes, virtual: bool = False) -> str
 ```

 ## External API
@@ -52,9 +70,10 @@ None — internal component, consumed by API layer.

 - Model bytes downloaded from Loader service (HTTP)
 - Converted TensorRT engines uploaded back to Loader for caching
- Video frames read via OpenCV VideoCapture
- Images read via OpenCV imread
- All processing is in-memory
+- Video frames decoded from in-memory bytes via PyAV (`av.open(BytesIO)`)
+- Images decoded from in-memory bytes via `cv2.imdecode`
+- Video bytes concurrently written to persistent storage path in background thread
+- All inference processing is in-memory

 ## Implementation Details

@@ -92,8 +111,10 @@ None — internal component, consumed by API layer.
 - Tile deduplication: absolute-coordinate Detection equality across adjacent tiles
 - Physical size filtering: remove detections exceeding class max_object_size_meters

-### Video Processing
+### Video Processing (PyAV-based — AZ-173)

+- Reads video from in-memory `BytesIO` via `av.open` (no filesystem read needed)
+- Concurrently writes bytes to `save_path` for persistent storage
 - Frame sampling: every Nth frame
 - Annotation validity heuristics: time gap, detection count increase, spatial movement, confidence improvement
 - JPEG encoding of valid frames for annotation images
@@ -107,9 +128,9 @@ None — internal component, consumed by API layer.

 - `ThreadPoolExecutor` with max_workers=2 limits concurrent inference (set in main.py)
 - Background TensorRT conversion runs in a daemon thread — may be interrupted on shutdown
- `init_ai()` called on every `run_detect` — idempotent but checks engine state each time
+- `init_ai()` called on every detection entry point — idempotent but checks engine state each time
 - Video processing is sequential per video (no parallel video processing)
- `_tile_detections` dict is instance-level state that persists across image calls within a single `run_detect` invocation
+- `_tile_detections` dict is instance-level state that persists across image calls within a single `run_detect_image` invocation

 ## Dependency Graph

@@ -122,6 +143,7 @@ graph TD
    inference -.-> onnx_engine
    inference -.-> tensorrt_engine
    inference --> loader_http_client
+    main --> media_hash
 ```

 ## Logging Strategy
@@ -2,18 +2,18 @@

 ## Overview

-**Purpose**: HTTP API layer exposing object detection capabilities via FastAPI — handles request/response serialization, async task management, SSE streaming, and authentication token forwarding.
+**Purpose**: HTTP API layer exposing object detection capabilities via FastAPI — handles request/response serialization, async task management, SSE streaming, media lifecycle management, DB-driven configuration, and authentication token forwarding.

-**Pattern**: Controller layer — thin API surface that delegates all business logic to the Inference Pipeline.
+**Pattern**: Controller layer — thin API surface that delegates inference to the Inference Pipeline and manages media records via the Annotations service.

-**Upstream**: Inference Pipeline (Inference class), Domain (constants_inf for labels).
+**Upstream**: Inference Pipeline (Inference class), Domain (constants_inf for labels), Annotations service (AI settings, media records).
 **Downstream**: None (top-level, client-facing).

 ## Modules

 | Module | Role |
 |--------|------|
-| `main` | FastAPI app definition, endpoints, DTOs, TokenManager, SSE streaming |
+| `main` | FastAPI app definition, endpoints, DTOs, TokenManager, SSE streaming, media lifecycle, DB-driven config resolution |

 ## External API Specification

@@ -24,6 +24,7 @@
 {
  "status": "healthy",
  "aiAvailability": "Enabled",
+  "engineType": "onnx",
  "errorMessage": null
 }
 ```
@@ -31,7 +32,7 @@

 ### POST /detect

-**Input**: Multipart form — `file` (image bytes), optional `config` (JSON string).
+**Input**: Multipart form — `file` (image or video bytes), optional `config` (JSON string), optional auth headers.
 **Response**: `list[DetectionDto]`
 ```json
 [
@@ -46,14 +47,15 @@
  }
 ]
 ```
-**Errors**: 400 (empty image / invalid data), 422 (runtime error), 503 (engine unavailable).
+**Behavior** (AZ-173, AZ-175): Accepts both images and videos. Detects upload kind by extension, falls back to content probing. If authenticated: computes content hash, persists to storage, creates media record, tracks status lifecycle (New → AI Processing → AI Processed / Error).
+**Errors**: 400 (empty/invalid image data), 422 (runtime error), 503 (engine unavailable).

 ### POST /detect/{media_id}

 **Input**: Path param `media_id`, optional JSON body `AIConfigDto`, headers `Authorization: Bearer {token}`, `x-refresh-token: {token}`.
 **Response**: `{"status": "started", "mediaId": "..."}` (202-style).
-**Errors**: 409 (duplicate detection for same media_id).
-**Side effects**: Starts async detection task; results delivered via SSE stream and/or posted to Annotations service.
+**Behavior** (AZ-174): Resolves media path and AI settings from Annotations service. Merges DB settings with client overrides. Reads file bytes from resolved path, dispatches `run_detect_image` or `run_detect_video`.
+**Errors**: 409 (duplicate detection), 503 (media path not resolved).

 ### GET /detect/stream

@@ -66,9 +68,10 @@ data: {"annotations": [...], "mediaId": "...", "mediaStatus": "AIProcessing", "m
 ## Data Access Patterns

 - In-memory state:
-  - `_active_detections: dict[str, bool]` — guards against duplicate media processing
+  - `_active_detections: dict[str, asyncio.Task]` — guards against duplicate media processing
  - `_event_queues: list[asyncio.Queue]` — SSE client queues (maxsize=100)
- No database access
+- Persistent media storage to `VIDEOS_DIR` / `IMAGES_DIR` (AZ-175)
+- No direct database access — media records managed via Annotations service HTTP API

 ## Implementation Details

@@ -76,8 +79,10 @@ data: {"annotations": [...], "mediaId": "...", "mediaStatus": "AIProcessing", "m
 - `ThreadPoolExecutor(max_workers=2)` runs inference off the async event loop
 - SSE: one `asyncio.Queue` per connected client; events broadcast to all queues; full queues silently drop events
 - `TokenManager` decodes JWT exp from base64 payload (no signature verification), auto-refreshes 60s before expiry
- `detection_to_dto` maps Detection fields to DetectionDto, looks up label from `constants_inf.annotations_dict`
- Annotations posted to external service with base64-encoded frame image
+- `TokenManager.decode_user_id` extracts user identity from multiple JWT claim formats (sub, userId, nameid, SAML)
+- DB-driven config via `_resolve_media_for_detect`: fetches AI settings from Annotations, merges nested sections and casing variants
+- Media lifecycle: `_post_media_record` + `_put_media_status` manage status transitions via Annotations API
+- Content hashing via `compute_media_content_hash` (XxHash64 with sampling) for media deduplication

 ## Caveats

@@ -88,6 +93,7 @@ data: {"annotations": [...], "mediaId": "...", "mediaStatus": "AIProcessing", "m
 - SSE queue overflow silently drops events (QueueFull caught and ignored)
 - JWT token handling has no signature verification — relies entirely on the Annotations service for auth
 - No graceful shutdown handling for in-progress detections
+- Media record creation failures are silently caught (detection proceeds regardless)

 ## Dependency Graph

@@ -96,6 +102,7 @@ graph TD
    main --> inference
    main --> constants_inf
    main --> loader_http_client
+    main --> media_hash
 ```

 ## Logging Strategy
@@ -2,7 +2,7 @@

 ## Purpose

-Data class holding all AI recognition configuration parameters, with factory methods for deserialization from msgpack and dict formats.
+Data class holding all AI recognition configuration parameters, with factory method for deserialization from dict format.

 ## Public Interface

@@ -18,8 +18,6 @@ Data class holding all AI recognition configuration parameters, with factory met
 | `tracking_distance_confidence` | double | 0.0 | Distance threshold for tracking (model-width units) |
 | `tracking_probability_increase` | double | 0.0 | Required confidence increase for tracking update |
 | `tracking_intersection_threshold` | double | 0.6 | IoU threshold for overlapping detection removal |
-| `file_data` | bytes | `b''` | Raw file data (msgpack use) |
-| `paths` | list[str] | `[]` | Media file paths to process |
 | `model_batch_size` | int | 1 | Batch size for inference |
 | `big_image_tile_overlap_percent` | int | 20 | Tile overlap percentage for large image splitting |
 | `altitude` | double | 400 | Camera altitude in meters |
@@ -30,18 +28,17 @@ Data class holding all AI recognition configuration parameters, with factory met

 | Method | Signature | Description |
 |--------|-----------|-------------|
-| `from_msgpack` | `(bytes data) -> AIRecognitionConfig` | Static cdef; deserializes from msgpack binary |
-| `from_dict` | `(dict data) -> AIRecognitionConfig` | Static def; deserializes from Python dict |
+| `from_dict` | `(dict data) -> AIRecognitionConfig` | Static cdef; deserializes from Python dict |

 ## Internal Logic

-Both factory methods apply defaults for missing keys. `from_msgpack` uses compact single-character keys (`f_pr`, `pt`, `t_dc`, etc.) while `from_dict` uses full descriptive keys.
+`from_dict` applies defaults for missing keys using full descriptive key names.

-**Legacy/unused**: `from_msgpack()` is defined but never called in the current codebase — it is a remnant of a previous queue-based architecture. Only `from_dict()` is actively used. The `file_data` field is stored but never read anywhere.
+**Removed**: `paths` field and `file_data` field were removed as part of the distributed architecture shift (AZ-174). Media paths are now resolved via the Annotations service API, not passed in config. `from_msgpack()` was also removed as it was unused.

 ## Dependencies

- **External**: `msgpack`
+- **External**: none
 - **Internal**: none (leaf module)

 ## Consumers
@@ -66,4 +63,4 @@ None.

 ## Tests

-None found.
+- `tests/test_ai_config_from_dict.py` — tests `ai_config_from_dict()` helper with defaults and overrides
@@ -6,32 +6,43 @@ Core inference orchestrator — manages the AI engine lifecycle, preprocesses me

 ## Public Interface

+### Free Functions
+
+| Function | Signature | Description |
+|----------|-----------|-------------|
+| `ai_config_from_dict` | `(dict data) -> AIRecognitionConfig` | Python-callable wrapper around `AIRecognitionConfig.from_dict` |
+
 ### Class: Inference

 #### Fields

 | Field | Type | Access | Description |
 |-------|------|--------|-------------|
-| `loader_client` | object | internal | LoaderHttpClient instance |
+| `loader_client` | LoaderHttpClient | internal | HTTP client for model download/upload |
 | `engine` | InferenceEngine | internal | Active engine (OnnxEngine or TensorRTEngine), None if unavailable |
 | `ai_availability_status` | AIAvailabilityStatus | public | Current AI readiness status |
 | `stop_signal` | bool | internal | Flag to abort video processing |
-| `model_width` | int | internal | Model input width in pixels |
-| `model_height` | int | internal | Model input height in pixels |
 | `detection_counts` | dict[str, int] | internal | Per-media detection count |
 | `is_building_engine` | bool | internal | True during async TensorRT conversion |

+#### Properties
+
+| Property | Return Type | Description |
+|----------|-------------|-------------|
+| `is_engine_ready` | bool | True if engine is not None |
+| `engine_name` | str or None | Engine type name from the active engine |
+
 #### Methods

 | Method | Signature | Access | Description |
 |--------|-----------|--------|-------------|
 | `__init__` | `(loader_client)` | public | Initializes state, calls `init_ai()` |
-| `run_detect` | `(dict config_dict, annotation_callback, status_callback=None)` | cpdef | Main entry: parses config, separates images/videos, processes each |
-| `detect_single_image` | `(bytes image_bytes, dict config_dict) -> list` | cpdef | Single-image detection from raw bytes, returns list[Detection] |
+| `run_detect_image` | `(bytes image_bytes, AIRecognitionConfig ai_config, str media_name, annotation_callback, status_callback=None)` | cpdef | Decodes image from bytes, runs tiling + inference + postprocessing |
+| `run_detect_video` | `(bytes video_bytes, AIRecognitionConfig ai_config, str media_name, str save_path, annotation_callback, status_callback=None)` | cpdef | Processes video from in-memory bytes via PyAV, concurrently writes to save_path |
 | `stop` | `()` | cpdef | Sets stop_signal to True |
-| `init_ai` | `()` | cdef | Engine initialization: tries TensorRT engine file → falls back to ONNX → background TensorRT conversion |
-| `preprocess` | `(frames) -> ndarray` | cdef | OpenCV blobFromImage: resize, normalize to 0..1, swap RGB, stack batch |
-| `postprocess` | `(output, ai_config) -> list[list[Detection]]` | cdef | Parses engine output to Detection objects, applies confidence threshold and overlap removal |
+| `init_ai` | `()` | cdef | Engine initialization: tries TensorRT → falls back to ONNX → background TensorRT conversion |
+| `preprocess` | `(frames) -> ndarray` | via engine | OpenCV blobFromImage: resize, normalize to 0..1, swap RGB, stack batch |
+| `postprocess` | `(output, ai_config) -> list[list[Detection]]` | via engine | Parses engine output to Detection objects, applies confidence threshold and overlap removal |

 ## Internal Logic

@@ -42,36 +53,27 @@ Core inference orchestrator — manages the AI engine lifecycle, preprocesses me
 3. If download fails → download ONNX model, start background thread for ONNX→TensorRT conversion
 4. If no GPU → load OnnxEngine from ONNX model bytes

-### Preprocessing
+### Stream-Based Media Processing (AZ-173)

- `cv2.dnn.blobFromImage`: scale 1/255, resize to model dims, BGR→RGB, no crop
- Stack multiple frames via `np.vstack` for batched inference
+Both `run_detect_image` and `run_detect_video` accept raw bytes instead of file paths. This supports the distributed architecture where media arrives as HTTP uploads or is read from storage by the API layer.

-### Postprocessing
+### Image Processing (`run_detect_image`)

- Engine output format: `[batch][detection_index][x1, y1, x2, y2, confidence, class_id]`
- Coordinates normalized to 0..1 by dividing by model width/height
- Converted to center-format (cx, cy, w, h) Detection objects
- Filtered by `probability_threshold`
- Overlapping detections removed via `remove_overlapping_detections` (greedy, keeps higher confidence)
+1. Decodes image bytes via `cv2.imdecode`
+2. Small images (≤1.5× model size): processed as single frame
+3. Large images: split into tiles based on GSD. Tile size = `METERS_IN_TILE / GSD` pixels. Tiles overlap by configurable percentage.
+4. Tile deduplication: absolute-coordinate comparison across adjacent tiles
+5. Size filtering: detections exceeding `AnnotationClass.max_object_size_meters` are removed

-### Image Processing
+### Video Processing (`run_detect_video`)

- Small images (≤1.5× model size): processed as single frame
- Large images: split into tiles based on ground sampling distance. Tile size = `METERS_IN_TILE / GSD` pixels. Tiles overlap by configurable percentage.
- Tile deduplication: absolute-coordinate comparison across adjacent tiles using `Detection.__eq__`
- Size filtering: detections whose physical size (meters) exceeds `AnnotationClass.max_object_size_meters` are removed. Physical size computed from GSD × pixel dimensions.
-
-### Video Processing
-
- Frame sampling: every Nth frame (`frame_period_recognition`)
- Batch accumulation up to engine batch size
- Annotation validity: must differ from previous annotation by either:
-  - Time gap ≥ `frame_recognition_seconds`
-  - More detections than previous
-  - Any detection moved beyond `tracking_distance_confidence` threshold
-  - Any detection confidence increased beyond `tracking_probability_increase`
- Valid frames get JPEG-encoded image attached
+1. Concurrently writes raw bytes to `save_path` in a background thread (for persistent storage)
+2. Opens video from in-memory `BytesIO` via PyAV (`av.open`)
+3. Decodes frames via `container.decode(vstream)` — no temporary file needed for reading
+4. Frame sampling: every Nth frame (`frame_period_recognition`)
+5. Batch accumulation up to engine batch size
+6. Annotation validity heuristics (time gap, detection count increase, spatial movement, confidence improvement)
+7. Valid frames get JPEG-encoded image attached

 ### Ground Sampling Distance (GSD)

@@ -79,12 +81,12 @@ Core inference orchestrator — manages the AI engine lifecycle, preprocesses me

 ## Dependencies

- **External**: `cv2`, `numpy`, `pynvml`, `mimetypes`, `pathlib`, `threading`
+- **External**: `cv2`, `numpy`, `av` (PyAV), `io`, `threading`
 - **Internal**: `constants_inf`, `ai_availability_status`, `annotation`, `ai_config`, `tensorrt_engine` (conditional), `onnx_engine` (conditional), `inference_engine` (type)

 ## Consumers

- `main` — lazy-initializes Inference, calls `run_detect`, `detect_single_image`, reads `ai_availability_status`
+- `main` — lazy-initializes Inference, calls `run_detect_image`/`run_detect_video`, reads `ai_availability_status` and `is_engine_ready`

 ## Data Models

@@ -104,4 +106,6 @@ None.

 ## Tests

-None found.
+- `tests/test_ai_config_from_dict.py` — tests `ai_config_from_dict` helper
+- `e2e/tests/test_video.py` — exercises `run_detect_video` via the full API
+- `e2e/tests/test_single_image.py` — exercises `run_detect_image` via the full API
@@ -2,7 +2,7 @@

 ## Purpose

-HTTP client for downloading and uploading model files (and other binary resources) via an external Loader microservice.
+HTTP client for downloading/uploading model files via the Loader service, and for querying the Annotations service API (user AI settings, media path resolution).

 ## Public Interface

@@ -17,16 +17,19 @@ Simple result wrapper.

 ### Class: LoaderHttpClient

-| Method | Signature | Description |
-|--------|-----------|-------------|
-| `__init__` | `(str base_url)` | Stores base URL, strips trailing slash |
-| `load_big_small_resource` | `(str filename, str directory) -> LoadResult` | POST to `/load/{filename}` with JSON body `{filename, folder}`, returns raw bytes |
-| `upload_big_small_resource` | `(bytes content, str filename, str directory) -> LoadResult` | POST to `/upload/{filename}` with multipart file + form data `{folder}` |
-| `stop` | `() -> None` | No-op placeholder |
+| Method | Signature | Access | Description |
+|--------|-----------|--------|-------------|
+| `__init__` | `(str base_url)` | public | Stores base URL, strips trailing slash |
+| `load_big_small_resource` | `(str filename, str directory) -> LoadResult` | cdef | POST to `/load/{filename}` with JSON body, returns raw bytes |
+| `upload_big_small_resource` | `(bytes content, str filename, str directory) -> LoadResult` | cdef | POST to `/upload/{filename}` with multipart file |
+| `fetch_user_ai_settings` | `(str user_id, str bearer_token) -> object` | cpdef | GET `/api/users/{user_id}/ai-settings`, returns parsed JSON dict or None |
+| `fetch_media_path` | `(str media_id, str bearer_token) -> object` | cpdef | GET `/api/media/{media_id}`, returns `path` string from response or None |

 ## Internal Logic

-Both load/upload methods wrap all exceptions into `LoadResult(err=str(e))`. Errors are logged via loguru but never raised.
+Model load/upload methods wrap all exceptions into `LoadResult(err=str(e))`. Errors are logged via loguru but never raised.
+
+`fetch_user_ai_settings` and `fetch_media_path` (added in AZ-174) call the Annotations service API with Bearer auth. On non-200 response or exception, they return None.

 ## Dependencies

@@ -36,7 +39,7 @@ Both load/upload methods wrap all exceptions into `LoadResult(err=str(e))`. Erro
 ## Consumers

 - `inference` — downloads ONNX/TensorRT models, uploads converted TensorRT engines
- `main` — instantiates client with `LOADER_URL`
+- `main` — instantiates two clients: one for Loader (`LOADER_URL`), one for Annotations (`ANNOTATIONS_URL`). Uses `fetch_user_ai_settings` and `fetch_media_path` on the annotations client.

 ## Data Models

@@ -44,18 +47,19 @@ Both load/upload methods wrap all exceptions into `LoadResult(err=str(e))`. Erro

 ## Configuration

- `base_url` — provided at construction time, sourced from `LOADER_URL` environment variable in `main.py`
+- `base_url` — provided at construction time, sourced from env vars in `main.py`

 ## External Integrations

 | Integration | Protocol | Endpoint Pattern |
 |-------------|----------|-----------------|
 | Loader service | HTTP POST | `/load/{filename}` (download), `/upload/{filename}` (upload) |
+| Annotations service | HTTP GET | `/api/users/{user_id}/ai-settings`, `/api/media/{media_id}` |

 ## Security

-None (no auth headers sent to loader).
+Bearer token forwarded in Authorization header for Annotations service calls.

 ## Tests

-None found.
+- `tests/test_az174_db_driven_config.py` — tests `_resolve_media_for_detect` which exercises `fetch_user_ai_settings` and `fetch_media_path`
@@ -2,7 +2,7 @@

 ## Purpose

-FastAPI application entry point — exposes HTTP API for object detection on images and video media, health checks, and Server-Sent Events (SSE) streaming of detection results.
+FastAPI application entry point — exposes HTTP API for object detection on images and video media, health checks, and Server-Sent Events (SSE) streaming of detection results. Manages media lifecycle (content hashing, persistent storage, media record creation, status updates) and DB-driven AI configuration.

 ## Public Interface

@@ -11,8 +11,8 @@ FastAPI application entry point — exposes HTTP API for object detection on ima
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/health` | Returns AI engine availability status |
-| POST | `/detect` | Single image detection (multipart file upload) |
-| POST | `/detect/{media_id}` | Start async detection on media from loader service |
+| POST | `/detect` | Image/video detection with media lifecycle management |
+| POST | `/detect/{media_id}` | Start async detection on media resolved from Annotations service |
 | GET | `/detect/stream` | SSE stream of detection events |

 ### DTOs (Pydantic Models)
@@ -21,8 +21,8 @@ FastAPI application entry point — exposes HTTP API for object detection on ima
 |-------|--------|-------------|
 | `DetectionDto` | centerX, centerY, width, height, classNum, label, confidence | Single detection result |
 | `DetectionEvent` | annotations (list[DetectionDto]), mediaId, mediaStatus, mediaPercent | SSE event payload |
-| `HealthResponse` | status, aiAvailability, errorMessage | Health check response |
-| `AIConfigDto` | frame_period_recognition, frame_recognition_seconds, probability_threshold, tracking_*, model_batch_size, big_image_tile_overlap_percent, altitude, focal_length, sensor_width, paths | Configuration input for media detection |
+| `HealthResponse` | status, aiAvailability, engineType, errorMessage | Health check response |
+| `AIConfigDto` | frame_period_recognition, frame_recognition_seconds, probability_threshold, tracking_*, model_batch_size, big_image_tile_overlap_percent, altitude, focal_length, sensor_width | Configuration input (no `paths` field — removed in AZ-174) |

 ### Class: TokenManager

@@ -30,31 +30,55 @@ FastAPI application entry point — exposes HTTP API for object detection on ima
 |--------|-----------|-------------|
 | `__init__` | `(str access_token, str refresh_token)` | Stores tokens |
 | `get_valid_token` | `() -> str` | Returns access_token; auto-refreshes if expiring within 60s |
+| `decode_user_id` | `(str token) -> Optional[str]` | Static. Extracts user ID from JWT claims (sub, userId, user_id, nameid, or SAML nameidentifier) |
+
+### Helper Functions
+
+| Function | Signature | Description |
+|----------|-----------|-------------|
+| `_merged_annotation_settings_payload` | `(raw: object) -> dict` | Merges nested AI settings from Annotations service response (handles `aiRecognitionSettings`, `cameraSettings` sub-objects and PascalCase/camelCase/snake_case aliases) |
+| `_resolve_media_for_detect` | `(media_id, token_mgr, override) -> tuple[dict, str]` | Fetches user AI settings + media path from Annotations service, merges with client overrides |
+| `_detect_upload_kind` | `(filename, data) -> tuple[str, str]` | Determines if upload is image or video by extension, falls back to content probing (cv2/PyAV) |
+| `_post_media_record` | `(payload, bearer) -> bool` | Creates media record via `POST /api/media` on Annotations service |
+| `_put_media_status` | `(media_id, status, bearer) -> bool` | Updates media status via `PUT /api/media/{media_id}/status` on Annotations service |
+| `compute_media_content_hash` | (imported from `media_hash`) | XxHash64 content hash with sampling |

 ## Internal Logic

 ### `/health`

-Returns `HealthResponse` with `status="healthy"` always. `aiAvailability` reflects the engine's `AIAvailabilityStatus`. On exception, returns `aiAvailability="None"`.
+Returns `HealthResponse` with `status="healthy"` always. `aiAvailability` reflects the engine's `AIAvailabilityStatus`. `engineType` reports the active engine name. On exception, returns `aiAvailability="None"`.

-### `/detect` (single image)
+### `/detect` (unified upload — AZ-173, AZ-175)

-1. Reads uploaded file bytes
-2. Parses optional JSON config
-3. Runs `inference.detect_single_image` in ThreadPoolExecutor (max 2 workers)
-4. Returns list of DetectionDto
+1. Reads uploaded file bytes, rejects empty
+2. Detects kind (image/video) via `_detect_upload_kind` (extension → content probe)
+3. Validates image data with `cv2.imdecode` if kind is image
+4. Parses optional JSON config
+5. Extracts auth tokens; if authenticated:
+   a. Computes XxHash64 content hash
+   b. Persists file to `VIDEOS_DIR` or `IMAGES_DIR`
+   c. Creates media record via `POST /api/media`
+   d. Sets status to `AI_PROCESSING` via `PUT /api/media/{id}/status`
+6. Runs `run_detect_image` or `run_detect_video` in ThreadPoolExecutor
+7. On success: sets status to `AI_PROCESSED`
+8. On failure: sets status to `ERROR`
+9. Returns list of `DetectionDto`

-Error mapping: RuntimeError("not available") → 503, RuntimeError → 422, ValueError → 400.
+### `/detect/{media_id}` (async — AZ-174)

-### `/detect/{media_id}` (async media)
-
-1. Checks for duplicate active detection (409 if already running)
-2. Extracts auth tokens from Authorization header and x-refresh-token header
-3. Creates `asyncio.Task` for background detection
-4. Detection runs `inference.run_detect` in ThreadPoolExecutor
-5. Callbacks push `DetectionEvent` to all SSE queues
-6. If auth token present, also POSTs annotations to the Annotations service
-7. Returns immediately: `{"status": "started", "mediaId": media_id}`
+1. Checks for duplicate active detection (409)
+2. Extracts auth tokens
+3. Resolves media via `_resolve_media_for_detect`:
+   a. Fetches user AI settings from `GET /api/users/{user_id}/ai-settings`
+   b. Merges with client overrides
+   c. Fetches media path from `GET /api/media/{media_id}`
+4. Reads file bytes from resolved path
+5. Creates `asyncio.Task` for background detection
+6. Calls `run_detect_video` or `run_detect_image` depending on file extension
+7. Callbacks push `DetectionEvent` to SSE queues and POST annotations to Annotations service
+8. Updates media status via `PUT /api/media/{id}/status`
+9. Returns immediately: `{"status": "started", "mediaId": media_id}`

 ### `/detect/stream` (SSE)

@@ -64,73 +88,18 @@ Error mapping: RuntimeError("not available") → 503, RuntimeError → 422, Valu

 ### Token Management

- Decodes JWT exp claim from base64 payload (no signature verification)
+- `_decode_exp`: Decodes JWT exp claim from base64 payload (no signature verification)
 - Auto-refreshes via POST to `{ANNOTATIONS_URL}/auth/refresh` when within 60s of expiry
+- `decode_user_id`: Extracts user identity from multiple possible JWT claim keys

 ### Annotations Service Integration

-Detections posts results to the Annotations service (`POST {ANNOTATIONS_URL}/annotations`) server-to-server during async media detection (F3). This only happens when an auth token is present in the original request.
-
-**Endpoint:** `POST {ANNOTATIONS_URL}/annotations`
-
-**Headers:**
-
-| Header | Value |
-|--------|-------|
-| Authorization | `Bearer {accessToken}` (forwarded from the original client request) |
-| Content-Type | `application/json` |
-
-**Request body — payload sent by Detections:**
-
-| Field | Type | Description |
-|-------|------|-------------|
-| mediaId | string | ID of the media being processed |
-| source | int | `0` (AnnotationSource.AI) |
-| videoTime | string | Video playback position formatted from ms as `"HH:MM:SS"` — mapped to `Annotations.Time` |
-| detections | list | Detection results for this batch (see below) |
-| image | string (base64) | Optional — base64-encoded frame image bytes |
-
-`userId` is not included in the payload. The Annotations service resolves the user identity from the Bearer JWT.
-
-**Detection object (as sent by Detections):**
-
-| Field | Type | Description |
-|-------|------|-------------|
-| centerX | float | X center, normalized 0.0–1.0 |
-| centerY | float | Y center, normalized 0.0–1.0 |
-| width | float | Width, normalized 0.0–1.0 |
-| height | float | Height, normalized 0.0–1.0 |
-| classNum | int | Detection class number |
-| label | string | Human-readable class name |
-| confidence | float | Model confidence 0.0–1.0 |
-
-The Annotations API contract (`CreateAnnotationRequest`) also accepts `description` (string), `affiliation` (AffiliationEnum), and `combatReadiness` (CombatReadinessEnum) on each Detection, but the Detections service does not populate these — the Annotations service uses defaults.
-
-**Responses from Annotations service:**
-
-| Status | Condition |
-|--------|-----------|
-| 201 | Annotation created |
-| 400 | Neither image nor mediaId provided |
-| 404 | MediaId not found in Annotations DB |
-
-**Failure handling:** POST failures are silently caught — detection processing continues regardless. Annotations that fail to post are not retried.
-
-**Downstream pipeline (Annotations service side):**
-
-1. Saves annotation to local PostgreSQL (image → XxHash64 ID, label file in YOLO format)
-2. Publishes SSE event to UI via `GET /annotations/events`
-3. Enqueues annotation ID to `annotations_queue_records` buffer table (unless SilentDetection mode is enabled in system settings)
-4. `FailsafeProducer` (BackgroundService) drains the buffer to RabbitMQ Stream (`azaion-annotations`) using MessagePack + Gzip
-
-**Token refresh for long-running video:**
-
-For video detection that may outlast the JWT lifetime, the `TokenManager` auto-refreshes via `POST {ANNOTATIONS_URL}/auth/refresh` when the token is within 60s of expiry. The refresh token is provided by the client in the `X-Refresh-Token` request header.
+Detections posts results to `POST {ANNOTATIONS_URL}/annotations` during async media detection (F3). Media lifecycle (create record, update status) uses `POST /api/media` and `PUT /api/media/{media_id}/status`.

 ## Dependencies

- **External**: `asyncio`, `base64`, `json`, `os`, `time`, `concurrent.futures`, `typing`, `requests`, `fastapi`, `pydantic`
- **Internal**: `inference` (lazy import), `constants_inf` (label lookup), `loader_http_client` (client instantiation)
+- **External**: `asyncio`, `base64`, `io`, `json`, `os`, `tempfile`, `time`, `concurrent.futures`, `pathlib`, `typing`, `av`, `cv2`, `numpy`, `requests`, `fastapi`, `pydantic`
+- **Internal**: `inference` (lazy import), `constants_inf` (label lookup), `loader_http_client` (client instantiation), `media_hash` (content hashing)

 ## Consumers

@@ -147,22 +116,29 @@ None (entry point).
 |---------|---------|-------------|
 | `LOADER_URL` | `http://loader:8080` | Loader service base URL |
 | `ANNOTATIONS_URL` | `http://annotations:8080` | Annotations service base URL |
+| `VIDEOS_DIR` | `{cwd}/data/videos` | Persistent video storage directory |
+| `IMAGES_DIR` | `{cwd}/data/images` | Persistent image storage directory |

 ## External Integrations

 | Service | Protocol | Purpose |
 |---------|----------|---------|
 | Loader | HTTP (via LoaderHttpClient) | Model loading |
-| Annotations | HTTP POST | Auth refresh (`/auth/refresh`), annotation posting (`/annotations`) |
+| Annotations | HTTP GET | User AI settings (`/api/users/{id}/ai-settings`), media path resolution (`/api/media/{id}`) |
+| Annotations | HTTP POST | Annotation posting (`/annotations`), media record creation (`/api/media`) |
+| Annotations | HTTP PUT | Media status updates (`/api/media/{id}/status`) |
+| Annotations | HTTP POST | Auth refresh (`/auth/refresh`) |

 ## Security

 - Bearer token from request headers, refreshed via Annotations service
 - JWT exp decoded (base64, no signature verification) — token validation is not performed locally
+- Image data validated via `cv2.imdecode` before processing
 - No CORS configuration
 - No rate limiting
- No input validation on media_id path parameter beyond string type

 ## Tests

-None found.
+- `tests/test_az174_db_driven_config.py` — `decode_user_id`, `_merged_annotation_settings_payload`, `_resolve_media_for_detect`
+- `tests/test_az175_api_calls.py` — `_post_media_record`, `_put_media_status`
+- `e2e/tests/test_*.py` — full API e2e tests (health, single image, video, async, SSE, negative, security, performance, resilience)
@@ -0,0 +1,50 @@
+# Module: media_hash
+
+## Purpose
+
+Content-based hashing for media files using XxHash64 with a deterministic sampling algorithm. Produces a stable, unique ID for any media file based on its content.
+
+## Public Interface
+
+| Function | Signature | Description |
+|----------|-----------|-------------|
+| `compute_media_content_hash` | `(data: bytes, virtual: bool = False) -> str` | Returns hex XxHash64 digest of sampled content. If `virtual=True`, prefixes with "V". |
+
+## Internal Logic
+
+### Sampling Algorithm (`_sampling_payload`)
+
+- **Small files** (< 3072 bytes): uses entire content
+- **Large files** (≥ 3072 bytes): samples 3 × 1024-byte windows: first 1024, middle 1024, last 1024
+- All payloads are prefixed with the 8-byte little-endian file size for collision resistance
+
+The sampling avoids reading the full file through the hash function while still providing high uniqueness — the head, middle, and tail capture format headers, content, and EOF markers.
+
+## Dependencies
+
+- **External**: `xxhash` (pinned at 3.5.0 in requirements.txt)
+- **Internal**: none (leaf module)
+
+## Consumers
+
+- `main` — computes content hash for uploaded media in `POST /detect` to use as the media record ID and storage filename
+
+## Data Models
+
+None.
+
+## Configuration
+
+None.
+
+## External Integrations
+
+None.
+
+## Security
+
+None. The hash is non-cryptographic (fast, not tamper-resistant).
+
+## Tests
+
+- `tests/test_media_hash.py` — covers small files, large files, and virtual prefix behavior
@@ -5,8 +5,8 @@
 | # | Flow Name | Trigger | Primary Components | Criticality |
 |---|-----------|---------|-------------------|-------------|
 | F1 | Health Check | Client GET /health | API, Inference Pipeline | High |
-| F2 | Single Image Detection | Client POST /detect | API, Inference Pipeline, Engines, Domain | High |
-| F3 | Media Detection (Async) | Client POST /detect/{media_id} | API, Inference Pipeline, Engines, Domain, Loader, Annotations | High |
+| F2 | Upload Detection (Image/Video) | Client POST /detect | API, Inference Pipeline, Engines, Domain, Annotations | High |
+| F3 | Media Detection (Async, DB-Driven) | Client POST /detect/{media_id} | API, Inference Pipeline, Engines, Domain, Annotations | High |
 | F4 | SSE Event Streaming | Client GET /detect/stream | API | Medium |
 | F5 | Engine Initialization | First detection request | Inference Pipeline, Engines, Loader | High |
 | F6 | TensorRT Background Conversion | No pre-built TensorRT engine | Inference Pipeline, Engines, Loader | Medium |
@@ -16,8 +16,8 @@
 | Flow | Depends On | Shares Data With |
 |------|-----------|-----------------|
 | F1 | F5 (for meaningful status) | — |
-| F2 | F5 (engine must be ready) | — |
-| F3 | F5 (engine must be ready) | F4 (via SSE event queues) |
+| F2 | F5 (engine must be ready) | Annotations (media lifecycle) |
+| F3 | F5 (engine must be ready) | F4 (via SSE event queues), Annotations (settings, media lifecycle) |
 | F4 | — | F3 (receives events) |
 | F5 | — | F6 (triggers conversion if needed) |
 | F6 | F5 (triggered by init failure) | F5 (provides converted bytes) |
@@ -55,11 +55,11 @@ sequenceDiagram

 ---

-## Flow F2: Single Image Detection
+## Flow F2: Upload Detection (Image or Video)

 ### Description

-Client uploads an image file and optionally provides config. The service runs inference synchronously (via ThreadPoolExecutor) and returns detection results.
+Client uploads a media file (image or video) and optionally provides config and auth tokens. The service detects the media kind, manages the media lifecycle (hashing, storage, record creation, status tracking), runs inference synchronously (via ThreadPoolExecutor), and returns detection results.

 ### Sequence Diagram

@@ -67,19 +67,40 @@ Client uploads an image file and optionally provides config. The service runs in
 sequenceDiagram
    participant Client
    participant API as main.py
+    participant HASH as media_hash
+    participant ANN as Annotations Service
    participant INF as Inference
    participant ENG as Engine (ONNX/TRT)
    participant CONST as constants_inf

-    Client->>API: POST /detect (file + config?)
-    API->>API: Read image bytes, parse config
-    API->>INF: detect_single_image(bytes, config_dict)
+    Client->>API: POST /detect (file + config? + auth?)
+    API->>API: Read bytes, detect kind (image/video)
+    API->>API: Validate image data (cv2.imdecode)
+
+    opt Authenticated user
+        API->>HASH: compute_media_content_hash(bytes)
+        HASH-->>API: content_hash
+        API->>API: Persist file to VIDEOS_DIR/IMAGES_DIR
+        API->>ANN: POST /api/media (create record)
+        API->>ANN: PUT /api/media/{id}/status (AI_PROCESSING)
+    end
+
+    alt Image
+        API->>INF: run_detect_image(bytes, ai_config, name, callback)
+    else Video
+        API->>INF: run_detect_video(bytes, ai_config, name, path, callback)
+    end
+
    INF->>INF: init_ai() (idempotent)
-    INF->>INF: cv2.imdecode → preprocess
-    INF->>ENG: run(input_blob)
+    INF->>ENG: process_frames(batch)
    ENG-->>INF: raw output
-    INF->>INF: postprocess → filter by threshold → remove overlaps
-    INF-->>API: list[Detection]
+    INF->>INF: postprocess → filter → callbacks
+    INF-->>API: results via callback
+
+    opt Authenticated user
+        API->>ANN: PUT /api/media/{id}/status (AI_PROCESSED)
+    end
+
    API->>CONST: annotations_dict[cls].name (label lookup)
    API-->>Client: list[DetectionDto]
 ```
@@ -88,18 +109,20 @@ sequenceDiagram

 | Error | Where | Detection | Recovery |
 |-------|-------|-----------|----------|
-| Empty image | API | len(bytes)==0 | 400 Bad Request |
-| Invalid image data | imdecode | frame is None | 400 ValueError |
+| Empty upload | API | len(bytes)==0 | 400 Bad Request |
+| Invalid image data | cv2.imdecode | returns None | 400 Bad Request |
+| Unrecognized format | _detect_upload_kind | cv2+PyAV probe fails | 400 Bad Request |
 | Engine not available | init_ai | engine is None | 503 Service Unavailable |
 | Inference failure | run/postprocess | RuntimeError | 422 Unprocessable Entity |
+| Media record failure | _post_media_record | exception caught | Silently continues |

 ---

-## Flow F3: Media Detection (Async)
+## Flow F3: Media Detection (Async, DB-Driven)

 ### Description

-Client triggers detection on media files (images/video) available via the Loader service. Processing runs asynchronously. Results are streamed via SSE (F4) and optionally posted to the Annotations service.
+Client triggers detection on a media file resolved from the Annotations service. AI settings are fetched from the user's DB profile and merged with client overrides. Processing runs asynchronously. Results are streamed via SSE (F4) and optionally posted to the Annotations service. Media status is tracked throughout.

 ### Sequence Diagram

@@ -107,37 +130,41 @@ Client triggers detection on media files (images/video) available via the Loader
 sequenceDiagram
    participant Client
    participant API as main.py
+    participant ANN as Annotations Service
    participant INF as Inference
    participant ENG as Engine
-    participant LDR as Loader Service
-    participant ANN as Annotations Service
    participant SSE as SSE Queues

-    Client->>API: POST /detect/{media_id} (config + auth headers)
+    Client->>API: POST /detect/{media_id} (config? + auth headers)
    API->>API: Check _active_detections (duplicate guard)
+    API->>ANN: GET /api/users/{user_id}/ai-settings
+    ANN-->>API: AI settings (merged with overrides)
+    API->>ANN: GET /api/media/{media_id}
+    ANN-->>API: media path
    API-->>Client: {"status": "started"}

    Note over API: asyncio.Task created

-    API->>INF: run_detect(config, on_annotation, on_status)
-    loop For each media file
-        INF->>INF: Read/decode media (cv2)
-        INF->>INF: Preprocess (tile/batch)
-        INF->>ENG: run(input_blob)
-        ENG-->>INF: raw output
-        INF->>INF: Postprocess + validate
+    API->>API: Read file bytes from resolved path
+    API->>ANN: PUT /api/media/{id}/status (AI_PROCESSING)

-        opt Valid annotation found
-            INF->>API: on_annotation(annotation, percent)
-            API->>SSE: DetectionEvent → all queues
-            opt Auth token present
-                API->>ANN: POST /annotations (detections + image)
-            end
+    alt Video file
+        API->>INF: run_detect_video(bytes, config, name, path, callbacks)
+    else Image file
+        API->>INF: run_detect_image(bytes, config, name, callbacks)
+    end
+
+    loop For each valid annotation
+        INF->>API: on_annotation(annotation, percent)
+        API->>SSE: DetectionEvent → all queues
+        opt Auth token present
+            API->>ANN: POST /annotations (detections + image)
        end
    end

    INF->>API: on_status(media_name, count)
    API->>SSE: DetectionEvent(status=AIProcessed, percent=100)
+    API->>ANN: PUT /api/media/{id}/status (AI_PROCESSED)
 ```

 ### Data Flow
@@ -145,12 +172,16 @@ sequenceDiagram
 | Step | From | To | Data | Format |
 |------|------|----|------|--------|
 | 1 | Client | API | media_id, config, auth tokens | HTTP POST JSON + headers |
-| 2 | API | Inference | config_dict, callbacks | Python dict + callables |
-| 3 | Inference | Engine | preprocessed batch | numpy ndarray |
-| 4 | Engine | Inference | raw detections | numpy ndarray |
-| 5 | Inference | API (callback) | Annotation + percent | Python objects |
-| 6 | API | SSE clients | DetectionEvent | SSE JSON stream |
-| 7 | API | Annotations Service | CreateAnnotationRequest | HTTP POST JSON |
+| 2 | API | Annotations | user AI settings request | HTTP GET |
+| 3 | API | Annotations | media path request | HTTP GET |
+| 4 | API | Annotations | media status update (AI_PROCESSING) | HTTP PUT JSON |
+| 5 | API | Inference | file bytes, config, callbacks | bytes + AIRecognitionConfig + callables |
+| 6 | Inference | Engine | preprocessed batch | numpy ndarray |
+| 7 | Engine | Inference | raw detections | numpy ndarray |
+| 8 | Inference | API (callback) | Annotation + percent | Python objects |
+| 9 | API | SSE clients | DetectionEvent | SSE JSON stream |
+| 10 | API | Annotations Service | CreateAnnotationRequest | HTTP POST JSON |
+| 11 | API | Annotations | media status update (AI_PROCESSED) | HTTP PUT JSON |

 **Step 7 — Annotations POST detail:**

@@ -2,8 +2,8 @@

 ## Current Step
 flow: existing-code
-step: 10
-name: Run Tests
+step: 11
+name: Update Docs
 status: done
 sub_step: 0
 retry_count: 0