mirror of
https://github.com/azaion/detections.git
synced 2026-04-22 09:36:32 +00:00
[AZ-172] Update documentation for distributed architecture, add Update Docs step to workflow
- Update module docs: main, inference, ai_config, loader_http_client - Add new module doc: media_hash - Update component docs: inference_pipeline, api - Update system-flows (F2, F3) and data_parameters - Add Task Mode to document skill for incremental doc updates - Insert Step 11 (Update Docs) in existing-code flow, renumber 11-13 to 12-14 Made-with: Cursor
This commit is contained in:
@@ -2,19 +2,20 @@
|
||||
|
||||
## Overview
|
||||
|
||||
**Purpose**: Orchestrates the full inference lifecycle — engine initialization with fallback strategy, media preprocessing (images + video), batched inference execution, postprocessing with detection filtering, and result delivery via callbacks.
|
||||
**Purpose**: Orchestrates the full inference lifecycle — engine initialization with fallback strategy, stream-based media preprocessing (images + video from bytes), batched inference execution, postprocessing with detection filtering, and result delivery via callbacks.
|
||||
|
||||
**Pattern**: Façade + Pipeline — `Inference` class is the single entry point that coordinates engine selection, preprocessing, inference, and postprocessing stages.
|
||||
|
||||
**Upstream**: Domain (data models, config, status), Inference Engines (OnnxEngine/TensorRTEngine), External Client (LoaderHttpClient).
|
||||
**Downstream**: API (creates Inference, calls `run_detect` and `detect_single_image`).
|
||||
**Downstream**: API (creates Inference, calls `run_detect_image` and `run_detect_video`).
|
||||
|
||||
## Modules
|
||||
|
||||
| Module | Role |
|
||||
|--------|------|
|
||||
| `inference` | Core orchestrator: engine lifecycle, preprocessing, postprocessing, image/video processing |
|
||||
| `loader_http_client` | HTTP client for model download/upload from Loader service |
|
||||
| `inference` | Core orchestrator: engine lifecycle, stream-based image/video processing, postprocessing |
|
||||
| `loader_http_client` | HTTP client for model download/upload (Loader) and API queries (Annotations service) |
|
||||
| `media_hash` | XxHash64 content hashing with sampling algorithm for media identification |
|
||||
|
||||
## Internal Interfaces
|
||||
|
||||
@@ -23,25 +24,42 @@
|
||||
```
|
||||
cdef class Inference:
|
||||
__init__(loader_client)
|
||||
cpdef run_detect(dict config_dict, annotation_callback, status_callback=None)
|
||||
cpdef list detect_single_image(bytes image_bytes, dict config_dict)
|
||||
cpdef run_detect_image(bytes image_bytes, AIRecognitionConfig ai_config, str media_name,
|
||||
annotation_callback, status_callback=None)
|
||||
cpdef run_detect_video(bytes video_bytes, AIRecognitionConfig ai_config, str media_name,
|
||||
str save_path, annotation_callback, status_callback=None)
|
||||
cpdef stop()
|
||||
|
||||
# Internal pipeline stages:
|
||||
cdef init_ai()
|
||||
cdef preprocess(frames) -> ndarray
|
||||
cdef postprocess(output, ai_config) -> list[list[Detection]]
|
||||
cdef remove_overlapping_detections(list[Detection], float threshold) -> list[Detection]
|
||||
cdef _process_images(AIRecognitionConfig, list[str] paths)
|
||||
cdef _process_video(AIRecognitionConfig, str video_name)
|
||||
cdef _process_video_pyav(AIRecognitionConfig, str original_media_name, object container)
|
||||
cdef _process_video_batch(AIRecognitionConfig, list frames, list timestamps, str name, int frame_count, int total, int model_w)
|
||||
cdef _append_image_frame_entries(AIRecognitionConfig, list all_frame_data, frame, str original_media_name)
|
||||
cdef _finalize_image_inference(AIRecognitionConfig, list all_frame_data)
|
||||
cdef split_to_tiles(frame, str media_stem, tile_size, overlap_percent)
|
||||
cdef remove_overlapping_detections(list[Detection], float threshold) -> list[Detection] (delegated to engine)
|
||||
```
|
||||
|
||||
### Free Functions
|
||||
|
||||
```
|
||||
def ai_config_from_dict(dict data) -> AIRecognitionConfig
|
||||
```
|
||||
|
||||
### LoaderHttpClient
|
||||
|
||||
```
|
||||
class LoaderHttpClient:
|
||||
load_big_small_resource(str filename, str directory) -> LoadResult
|
||||
upload_big_small_resource(bytes content, str filename, str directory) -> LoadResult
|
||||
cdef load_big_small_resource(str filename, str directory) -> LoadResult
|
||||
cdef upload_big_small_resource(bytes content, str filename, str directory) -> LoadResult
|
||||
cpdef fetch_user_ai_settings(str user_id, str bearer_token) -> object
|
||||
cpdef fetch_media_path(str media_id, str bearer_token) -> object
|
||||
```
|
||||
|
||||
### media_hash
|
||||
|
||||
```
|
||||
def compute_media_content_hash(data: bytes, virtual: bool = False) -> str
|
||||
```
|
||||
|
||||
## External API
|
||||
@@ -52,9 +70,10 @@ None — internal component, consumed by API layer.
|
||||
|
||||
- Model bytes downloaded from Loader service (HTTP)
|
||||
- Converted TensorRT engines uploaded back to Loader for caching
|
||||
- Video frames read via OpenCV VideoCapture
|
||||
- Images read via OpenCV imread
|
||||
- All processing is in-memory
|
||||
- Video frames decoded from in-memory bytes via PyAV (`av.open(BytesIO)`)
|
||||
- Images decoded from in-memory bytes via `cv2.imdecode`
|
||||
- Video bytes concurrently written to persistent storage path in background thread
|
||||
- All inference processing is in-memory
|
||||
|
||||
## Implementation Details
|
||||
|
||||
@@ -92,8 +111,10 @@ None — internal component, consumed by API layer.
|
||||
- Tile deduplication: absolute-coordinate Detection equality across adjacent tiles
|
||||
- Physical size filtering: remove detections exceeding class max_object_size_meters
|
||||
|
||||
### Video Processing
|
||||
### Video Processing (PyAV-based — AZ-173)
|
||||
|
||||
- Reads video from in-memory `BytesIO` via `av.open` (no filesystem read needed)
|
||||
- Concurrently writes bytes to `save_path` for persistent storage
|
||||
- Frame sampling: every Nth frame
|
||||
- Annotation validity heuristics: time gap, detection count increase, spatial movement, confidence improvement
|
||||
- JPEG encoding of valid frames for annotation images
|
||||
@@ -107,9 +128,9 @@ None — internal component, consumed by API layer.
|
||||
|
||||
- `ThreadPoolExecutor` with max_workers=2 limits concurrent inference (set in main.py)
|
||||
- Background TensorRT conversion runs in a daemon thread — may be interrupted on shutdown
|
||||
- `init_ai()` called on every `run_detect` — idempotent but checks engine state each time
|
||||
- `init_ai()` called on every detection entry point — idempotent but checks engine state each time
|
||||
- Video processing is sequential per video (no parallel video processing)
|
||||
- `_tile_detections` dict is instance-level state that persists across image calls within a single `run_detect` invocation
|
||||
- `_tile_detections` dict is instance-level state that persists across image calls within a single `run_detect_image` invocation
|
||||
|
||||
## Dependency Graph
|
||||
|
||||
@@ -122,6 +143,7 @@ graph TD
|
||||
inference -.-> onnx_engine
|
||||
inference -.-> tensorrt_engine
|
||||
inference --> loader_http_client
|
||||
main --> media_hash
|
||||
```
|
||||
|
||||
## Logging Strategy
|
||||
|
||||
@@ -2,18 +2,18 @@
|
||||
|
||||
## Overview
|
||||
|
||||
**Purpose**: HTTP API layer exposing object detection capabilities via FastAPI — handles request/response serialization, async task management, SSE streaming, and authentication token forwarding.
|
||||
**Purpose**: HTTP API layer exposing object detection capabilities via FastAPI — handles request/response serialization, async task management, SSE streaming, media lifecycle management, DB-driven configuration, and authentication token forwarding.
|
||||
|
||||
**Pattern**: Controller layer — thin API surface that delegates all business logic to the Inference Pipeline.
|
||||
**Pattern**: Controller layer — thin API surface that delegates inference to the Inference Pipeline and manages media records via the Annotations service.
|
||||
|
||||
**Upstream**: Inference Pipeline (Inference class), Domain (constants_inf for labels).
|
||||
**Upstream**: Inference Pipeline (Inference class), Domain (constants_inf for labels), Annotations service (AI settings, media records).
|
||||
**Downstream**: None (top-level, client-facing).
|
||||
|
||||
## Modules
|
||||
|
||||
| Module | Role |
|
||||
|--------|------|
|
||||
| `main` | FastAPI app definition, endpoints, DTOs, TokenManager, SSE streaming |
|
||||
| `main` | FastAPI app definition, endpoints, DTOs, TokenManager, SSE streaming, media lifecycle, DB-driven config resolution |
|
||||
|
||||
## External API Specification
|
||||
|
||||
@@ -24,6 +24,7 @@
|
||||
{
|
||||
"status": "healthy",
|
||||
"aiAvailability": "Enabled",
|
||||
"engineType": "onnx",
|
||||
"errorMessage": null
|
||||
}
|
||||
```
|
||||
@@ -31,7 +32,7 @@
|
||||
|
||||
### POST /detect
|
||||
|
||||
**Input**: Multipart form — `file` (image bytes), optional `config` (JSON string).
|
||||
**Input**: Multipart form — `file` (image or video bytes), optional `config` (JSON string), optional auth headers.
|
||||
**Response**: `list[DetectionDto]`
|
||||
```json
|
||||
[
|
||||
@@ -46,14 +47,15 @@
|
||||
}
|
||||
]
|
||||
```
|
||||
**Errors**: 400 (empty image / invalid data), 422 (runtime error), 503 (engine unavailable).
|
||||
**Behavior** (AZ-173, AZ-175): Accepts both images and videos. Detects upload kind by extension, falls back to content probing. If authenticated: computes content hash, persists to storage, creates media record, tracks status lifecycle (New → AI Processing → AI Processed / Error).
|
||||
**Errors**: 400 (empty/invalid image data), 422 (runtime error), 503 (engine unavailable).
|
||||
|
||||
### POST /detect/{media_id}
|
||||
|
||||
**Input**: Path param `media_id`, optional JSON body `AIConfigDto`, headers `Authorization: Bearer {token}`, `x-refresh-token: {token}`.
|
||||
**Response**: `{"status": "started", "mediaId": "..."}` (202-style).
|
||||
**Errors**: 409 (duplicate detection for same media_id).
|
||||
**Side effects**: Starts async detection task; results delivered via SSE stream and/or posted to Annotations service.
|
||||
**Behavior** (AZ-174): Resolves media path and AI settings from Annotations service. Merges DB settings with client overrides. Reads file bytes from resolved path, dispatches `run_detect_image` or `run_detect_video`.
|
||||
**Errors**: 409 (duplicate detection), 503 (media path not resolved).
|
||||
|
||||
### GET /detect/stream
|
||||
|
||||
@@ -66,9 +68,10 @@ data: {"annotations": [...], "mediaId": "...", "mediaStatus": "AIProcessing", "m
|
||||
## Data Access Patterns
|
||||
|
||||
- In-memory state:
|
||||
- `_active_detections: dict[str, bool]` — guards against duplicate media processing
|
||||
- `_active_detections: dict[str, asyncio.Task]` — guards against duplicate media processing
|
||||
- `_event_queues: list[asyncio.Queue]` — SSE client queues (maxsize=100)
|
||||
- No database access
|
||||
- Persistent media storage to `VIDEOS_DIR` / `IMAGES_DIR` (AZ-175)
|
||||
- No direct database access — media records managed via Annotations service HTTP API
|
||||
|
||||
## Implementation Details
|
||||
|
||||
@@ -76,8 +79,10 @@ data: {"annotations": [...], "mediaId": "...", "mediaStatus": "AIProcessing", "m
|
||||
- `ThreadPoolExecutor(max_workers=2)` runs inference off the async event loop
|
||||
- SSE: one `asyncio.Queue` per connected client; events broadcast to all queues; full queues silently drop events
|
||||
- `TokenManager` decodes JWT exp from base64 payload (no signature verification), auto-refreshes 60s before expiry
|
||||
- `detection_to_dto` maps Detection fields to DetectionDto, looks up label from `constants_inf.annotations_dict`
|
||||
- Annotations posted to external service with base64-encoded frame image
|
||||
- `TokenManager.decode_user_id` extracts user identity from multiple JWT claim formats (sub, userId, nameid, SAML)
|
||||
- DB-driven config via `_resolve_media_for_detect`: fetches AI settings from Annotations, merges nested sections and casing variants
|
||||
- Media lifecycle: `_post_media_record` + `_put_media_status` manage status transitions via Annotations API
|
||||
- Content hashing via `compute_media_content_hash` (XxHash64 with sampling) for media deduplication
|
||||
|
||||
## Caveats
|
||||
|
||||
@@ -88,6 +93,7 @@ data: {"annotations": [...], "mediaId": "...", "mediaStatus": "AIProcessing", "m
|
||||
- SSE queue overflow silently drops events (QueueFull caught and ignored)
|
||||
- JWT token handling has no signature verification — relies entirely on the Annotations service for auth
|
||||
- No graceful shutdown handling for in-progress detections
|
||||
- Media record creation failures are silently caught (detection proceeds regardless)
|
||||
|
||||
## Dependency Graph
|
||||
|
||||
@@ -96,6 +102,7 @@ graph TD
|
||||
main --> inference
|
||||
main --> constants_inf
|
||||
main --> loader_http_client
|
||||
main --> media_hash
|
||||
```
|
||||
|
||||
## Logging Strategy
|
||||
|
||||
Reference in New Issue
Block a user