Add detailed file index and enhance skill documentation for autopilot, decompose, deploy, plan, and research skills. Introduce tests-only mode in decompose skill, clarify required files for deploy and plan skills, and improve prerequisite checks across skills for better user guidance and workflow efficiency.

2026-06-21 11:21:07 +00:00 · 2026-03-22 16:15:49 +02:00
parent 60ebe686ff
commit 3165a88f0b
60 changed files with 6324 additions and 1550 deletions
@@ -0,0 +1,68 @@
+# Module: ai_availability_status
+
+## Purpose
+
+Thread-safe status tracker for the AI engine lifecycle (downloading, converting, uploading, enabled, warning, error).
+
+## Public Interface
+
+### Enum: AIAvailabilityEnum
+
+| Value | Name | Meaning |
+|-------|------|---------|
+| 0 | NONE | Initial state, not yet initialized |
+| 10 | DOWNLOADING | Model download in progress |
+| 20 | CONVERTING | ONNX-to-TensorRT conversion in progress |
+| 30 | UPLOADING | Converted model upload in progress |
+| 200 | ENABLED | Engine ready for inference |
+| 300 | WARNING | Operational with warnings |
+| 500 | ERROR | Failed, not operational |
+
+### Class: AIAvailabilityStatus
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `status` | AIAvailabilityEnum | Current status |
+| `error_message` | str or None | Error/warning details |
+
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `__init__` | `()` | Sets status=NONE, error_message=None |
+| `__str__` | `() -> str` | Thread-safe formatted string: `"StatusText ErrorText"` |
+| `serialize` | `() -> bytes` | Thread-safe msgpack serialization `{s: status, m: error_message}` **(legacy — not called in current codebase)** |
+| `set_status` | `(AIAvailabilityEnum status, str error_message=None) -> void` | Thread-safe status update; logs via constants_inf (error or info) |
+
+## Internal Logic
+
+All public methods acquire a `threading.Lock` before reading/writing status fields. `set_status` logs the transition: errors go to `constants_inf.logerror`, normal transitions go to `constants_inf.log`.
+
+## Dependencies
+
+- **External**: `msgpack`, `threading`
+- **Internal**: `constants_inf` (logging)
+
+## Consumers
+
+- `inference` — creates instance, calls `set_status` during engine lifecycle, exposes `ai_availability_status` for health checks
+- `main` — reads `ai_availability_status` via inference for `/health` endpoint
+
+## Data Models
+
+- `AIAvailabilityEnum` — status enum
+- `AIAvailabilityStatus` — stateful status holder
+
+## Configuration
+
+None.
+
+## External Integrations
+
+None.
+
+## Security
+
+Thread-safe via Lock — safe for concurrent access from FastAPI async + ThreadPoolExecutor.
+
+## Tests
+
+None found.
@@ -0,0 +1,69 @@
+# Module: ai_config
+
+## Purpose
+
+Data class holding all AI recognition configuration parameters, with factory methods for deserialization from msgpack and dict formats.
+
+## Public Interface
+
+### Class: AIRecognitionConfig
+
+#### Fields
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `frame_period_recognition` | int | 4 | Process every Nth frame in video |
+| `frame_recognition_seconds` | double | 2.0 | Minimum seconds between valid video annotations |
+| `probability_threshold` | double | 0.25 | Minimum detection confidence |
+| `tracking_distance_confidence` | double | 0.0 | Distance threshold for tracking (model-width units) |
+| `tracking_probability_increase` | double | 0.0 | Required confidence increase for tracking update |
+| `tracking_intersection_threshold` | double | 0.6 | IoU threshold for overlapping detection removal |
+| `file_data` | bytes | `b''` | Raw file data (msgpack use) |
+| `paths` | list[str] | `[]` | Media file paths to process |
+| `model_batch_size` | int | 1 | Batch size for inference |
+| `big_image_tile_overlap_percent` | int | 20 | Tile overlap percentage for large image splitting |
+| `altitude` | double | 400 | Camera altitude in meters |
+| `focal_length` | double | 24 | Camera focal length in mm |
+| `sensor_width` | double | 23.5 | Camera sensor width in mm |
+
+#### Methods
+
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `from_msgpack` | `(bytes data) -> AIRecognitionConfig` | Static cdef; deserializes from msgpack binary |
+| `from_dict` | `(dict data) -> AIRecognitionConfig` | Static def; deserializes from Python dict |
+
+## Internal Logic
+
+Both factory methods apply defaults for missing keys. `from_msgpack` uses compact single-character keys (`f_pr`, `pt`, `t_dc`, etc.) while `from_dict` uses full descriptive keys.
+
+**Legacy/unused**: `from_msgpack()` is defined but never called in the current codebase — it is a remnant of a previous queue-based architecture. Only `from_dict()` is actively used. The `file_data` field is stored but never read anywhere.
+
+## Dependencies
+
+- **External**: `msgpack`
+- **Internal**: none (leaf module)
+
+## Consumers
+
+- `inference` — creates config from dict, uses all fields for frame selection, detection filtering, image tiling, and tracking
+
+## Data Models
+
+- `AIRecognitionConfig` — the sole data class
+
+## Configuration
+
+Camera/altitude parameters (`altitude`, `focal_length`, `sensor_width`) are used for ground sampling distance calculation in aerial image processing.
+
+## External Integrations
+
+None.
+
+## Security
+
+None.
+
+## Tests
+
+None found.
@@ -0,0 +1,83 @@
+# Module: annotation
+
+## Purpose
+
+Data models for object detections and annotations (grouped detections for a frame/tile with metadata).
+
+## Public Interface
+
+### Class: Detection
+
+Represents a single bounding box detection in normalized coordinates.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `x` | double | Center X (normalized 0..1) |
+| `y` | double | Center Y (normalized 0..1) |
+| `w` | double | Width (normalized 0..1) |
+| `h` | double | Height (normalized 0..1) |
+| `cls` | int | Class ID (maps to constants_inf.annotations_dict) |
+| `confidence` | double | Detection confidence (0..1) |
+| `annotation_name` | str | Parent annotation name (set after construction) |
+
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `__init__` | `(double x, y, w, h, int cls, double confidence)` | Constructor |
+| `__str__` | `() -> str` | Format: `"{cls}: {x} {y} {w} {h}, prob: {confidence}%"` |
+| `__eq__` | `(other) -> bool` | Two detections are equal if all bbox coordinates differ by less than `TILE_DUPLICATE_CONFIDENCE_THRESHOLD` |
+| `overlaps` | `(Detection det2, float confidence_threshold) -> bool` | Returns True if IoU-like overlap ratio (overlap area / min area) exceeds threshold |
+
+### Class: Annotation
+
+Groups detections for a single frame or image tile.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `name` | str | Unique annotation name (encodes tile/time info) |
+| `original_media_name` | str | Source media filename (without extension/spaces) |
+| `time` | long | Timestamp in milliseconds (video) or 0 (image) |
+| `detections` | list[Detection] | Detections found in this frame/tile |
+| `image` | bytes | JPEG-encoded frame image (set after validation) |
+
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `__init__` | `(str name, str original_media_name, long ms, list[Detection] detections)` | Sets annotation_name on all detections |
+| `__str__` | `() -> str` | Formatted detection summary |
+| `serialize` | `() -> bytes` | Msgpack serialization with compact keys **(legacy — not called in current codebase)** |
+
+## Internal Logic
+
+- `Detection.__eq__` uses `constants_inf.TILE_DUPLICATE_CONFIDENCE_THRESHOLD` (0.01) to determine if two detections at absolute coordinates are duplicates across adjacent tiles.
+- `Detection.overlaps` computes the overlap as `overlap_area / min(area1, area2)` — this is not standard IoU but a containment-biased metric.
+- `Annotation.__init__` sets `annotation_name` on every child detection.
+
+## Dependencies
+
+- **External**: `msgpack`
+- **Internal**: `constants_inf` (TILE_DUPLICATE_CONFIDENCE_THRESHOLD constant)
+
+## Consumers
+
+- `inference` — creates Detection and Annotation instances during postprocessing, uses overlaps for NMS, uses equality for tile dedup
+- `main` — reads Detection fields for DTO conversion
+
+## Data Models
+
+- `Detection` — bounding box + class + confidence
+- `Annotation` — frame/tile container for detections + metadata + image
+
+## Configuration
+
+None.
+
+## External Integrations
+
+None.
+
+## Security
+
+None.
+
+## Tests
+
+None found.
@@ -0,0 +1,95 @@
+# Module: constants_inf
+
+## Purpose
+
+Application-wide constants, logging infrastructure, and the object detection class registry loaded from `classes.json`.
+
+## Public Interface
+
+### Constants
+
+| Name | Type | Value | Description |
+|------|------|-------|-------------|
+| `CONFIG_FILE` | str | `"config.yaml"` | Configuration file path |
+| `QUEUE_CONFIG_FILENAME` | str | `"secured-config.json"` | Queue config filename |
+| `AI_ONNX_MODEL_FILE` | str | `"azaion.onnx"` | ONNX model filename |
+| `CDN_CONFIG` | str | `"cdn.yaml"` | CDN configuration file |
+| `MODELS_FOLDER` | str | `"models"` | Directory for model files |
+| `SMALL_SIZE_KB` | int | `3` | Small file size threshold (KB) |
+| `SPLIT_SUFFIX` | str | `"!split!"` | Delimiter in tiled image names |
+| `TILE_DUPLICATE_CONFIDENCE_THRESHOLD` | double | `0.01` | Threshold for tile duplicate detection equality |
+| `METERS_IN_TILE` | int | `25` | Physical tile size in meters for large image splitting |
+| `weather_switcher_increase` | int | `20` | Offset between weather mode class ID ranges |
+
+### Enum: WeatherMode
+
+| Value | Name | Meaning |
+|-------|------|---------|
+| 0 | Norm | Normal weather |
+| 20 | Wint | Winter |
+| 40 | Night | Night |
+
+### Class: AnnotationClass
+
+Fields: `id` (int), `name` (str), `color` (str), `max_object_size_meters` (int).
+
+Represents a detection class with its display metadata and physical size constraint.
+
+### Functions
+
+| Function | Signature | Description |
+|----------|-----------|-------------|
+| `log` | `(str log_message) -> void` | Info-level log via loguru |
+| `logerror` | `(str error) -> void` | Error-level log via loguru |
+| `format_time` | `(int ms) -> str` | Converts milliseconds to compact time string `HMMSSf` |
+
+### Global: `annotations_dict`
+
+`dict[int, AnnotationClass]` — loaded at module init from `classes.json`. Contains 19 base classes × 3 weather modes (Norm/Wint/Night) = up to 57 entries. Keys are class IDs, values are `AnnotationClass` instances.
+
+## Internal Logic
+
+- On import, reads `classes.json` and builds `annotations_dict` by iterating 3 weather mode offsets (0, 20, 40) and adding class ID offsets. Weather mode names are appended to class names for non-Norm modes.
+- Configures loguru with:
+  - File sink: `Logs/log_inference_YYYYMMDD.txt` (daily rotation, 30-day retention)
+  - Stdout: INFO/DEBUG/SUCCESS levels
+  - Stderr: WARNING and above
+
+## Legacy / Orphaned Declarations
+
+The `.pxd` header declares `QUEUE_MAXSIZE`, `COMMANDS_QUEUE`, and `ANNOTATIONS_QUEUE` (with comments referencing RabbitMQ) that are **not defined** in the `.pyx` implementation. These are remnants of a previous queue-based architecture and are unused.
+
+## Dependencies
+
+- **External**: `json`, `sys`, `loguru`
+- **Internal**: none (leaf module)
+
+## Consumers
+
+- `ai_availability_status` (logging)
+- `annotation` (tile duplicate threshold)
+- `onnx_engine` (logging)
+- `tensorrt_engine` (logging)
+- `inference` (logging, constants, annotations_dict, format_time, SPLIT_SUFFIX, METERS_IN_TILE, MODELS_FOLDER, AI_ONNX_MODEL_FILE)
+- `main` (annotations_dict for label lookup)
+
+## Data Models
+
+- `AnnotationClass` — detection class metadata
+- `WeatherMode` — enum for weather conditions
+
+## Configuration
+
+- Reads `classes.json` at import time (must exist in working directory)
+
+## External Integrations
+
+None.
+
+## Security
+
+None.
+
+## Tests
+
+None found.
@@ -0,0 +1,107 @@
+# Module: inference
+
+## Purpose
+
+Core inference orchestrator — manages the AI engine lifecycle, preprocesses media (images and video), runs batched inference, postprocesses detections, and applies validation filters (overlap removal, size filtering, tile deduplication, video tracking).
+
+## Public Interface
+
+### Class: Inference
+
+#### Fields
+
+| Field | Type | Access | Description |
+|-------|------|--------|-------------|
+| `loader_client` | object | internal | LoaderHttpClient instance |
+| `engine` | InferenceEngine | internal | Active engine (OnnxEngine or TensorRTEngine), None if unavailable |
+| `ai_availability_status` | AIAvailabilityStatus | public | Current AI readiness status |
+| `stop_signal` | bool | internal | Flag to abort video processing |
+| `model_width` | int | internal | Model input width in pixels |
+| `model_height` | int | internal | Model input height in pixels |
+| `detection_counts` | dict[str, int] | internal | Per-media detection count |
+| `is_building_engine` | bool | internal | True during async TensorRT conversion |
+
+#### Methods
+
+| Method | Signature | Access | Description |
+|--------|-----------|--------|-------------|
+| `__init__` | `(loader_client)` | public | Initializes state, calls `init_ai()` |
+| `run_detect` | `(dict config_dict, annotation_callback, status_callback=None)` | cpdef | Main entry: parses config, separates images/videos, processes each |
+| `detect_single_image` | `(bytes image_bytes, dict config_dict) -> list` | cpdef | Single-image detection from raw bytes, returns list[Detection] |
+| `stop` | `()` | cpdef | Sets stop_signal to True |
+| `init_ai` | `()` | cdef | Engine initialization: tries TensorRT engine file → falls back to ONNX → background TensorRT conversion |
+| `preprocess` | `(frames) -> ndarray` | cdef | OpenCV blobFromImage: resize, normalize to 0..1, swap RGB, stack batch |
+| `postprocess` | `(output, ai_config) -> list[list[Detection]]` | cdef | Parses engine output to Detection objects, applies confidence threshold and overlap removal |
+
+## Internal Logic
+
+### Engine Initialization (`init_ai`)
+
+1. If `_converted_model_bytes` exists → load TensorRT from those bytes
+2. If GPU available → try downloading pre-built TensorRT engine from loader
+3. If download fails → download ONNX model, start background thread for ONNX→TensorRT conversion
+4. If no GPU → load OnnxEngine from ONNX model bytes
+
+### Preprocessing
+
+- `cv2.dnn.blobFromImage`: scale 1/255, resize to model dims, BGR→RGB, no crop
+- Stack multiple frames via `np.vstack` for batched inference
+
+### Postprocessing
+
+- Engine output format: `[batch][detection_index][x1, y1, x2, y2, confidence, class_id]`
+- Coordinates normalized to 0..1 by dividing by model width/height
+- Converted to center-format (cx, cy, w, h) Detection objects
+- Filtered by `probability_threshold`
+- Overlapping detections removed via `remove_overlapping_detections` (greedy, keeps higher confidence)
+
+### Image Processing
+
+- Small images (≤1.5× model size): processed as single frame
+- Large images: split into tiles based on ground sampling distance. Tile size = `METERS_IN_TILE / GSD` pixels. Tiles overlap by configurable percentage.
+- Tile deduplication: absolute-coordinate comparison across adjacent tiles using `Detection.__eq__`
+- Size filtering: detections whose physical size (meters) exceeds `AnnotationClass.max_object_size_meters` are removed. Physical size computed from GSD × pixel dimensions.
+
+### Video Processing
+
+- Frame sampling: every Nth frame (`frame_period_recognition`)
+- Batch accumulation up to engine batch size
+- Annotation validity: must differ from previous annotation by either:
+  - Time gap ≥ `frame_recognition_seconds`
+  - More detections than previous
+  - Any detection moved beyond `tracking_distance_confidence` threshold
+  - Any detection confidence increased beyond `tracking_probability_increase`
+- Valid frames get JPEG-encoded image attached
+
+### Ground Sampling Distance (GSD)
+
+`GSD = sensor_width * altitude / (focal_length * image_width)` — meters per pixel, used for physical size filtering of aerial detections.
+
+## Dependencies
+
+- **External**: `cv2`, `numpy`, `pynvml`, `mimetypes`, `pathlib`, `threading`
+- **Internal**: `constants_inf`, `ai_availability_status`, `annotation`, `ai_config`, `tensorrt_engine` (conditional), `onnx_engine` (conditional), `inference_engine` (type)
+
+## Consumers
+
+- `main` — lazy-initializes Inference, calls `run_detect`, `detect_single_image`, reads `ai_availability_status`
+
+## Data Models
+
+Uses `Detection`, `Annotation` (from annotation), `AIRecognitionConfig` (from ai_config), `AIAvailabilityStatus` (from ai_availability_status).
+
+## Configuration
+
+All runtime config comes via `AIRecognitionConfig` dict. Engine selection is automatic based on GPU availability (checked at module-level via pynvml).
+
+## External Integrations
+
+- **Loader service** (via loader_client): model download/upload
+
+## Security
+
+None.
+
+## Tests
+
+None found.
@@ -0,0 +1,59 @@
+# Module: inference_engine
+
+## Purpose
+
+Abstract base class defining the interface that all inference engine implementations must follow.
+
+## Public Interface
+
+### Class: InferenceEngine
+
+#### Fields
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `batch_size` | int | Number of images per inference batch |
+
+#### Methods
+
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `__init__` | `(bytes model_bytes, int batch_size=1, **kwargs)` | Stores batch_size |
+| `get_input_shape` | `() -> tuple` | Returns (height, width) of model input. Abstract — raises `NotImplementedError` |
+| `get_batch_size` | `() -> int` | Returns `self.batch_size` |
+| `run` | `(input_data) -> list` | Runs inference on preprocessed input blob. Abstract — raises `NotImplementedError` |
+
+## Internal Logic
+
+Pure abstract class. All methods except `get_batch_size` raise `NotImplementedError` and must be overridden by subclasses (`OnnxEngine`, `TensorRTEngine`).
+
+## Dependencies
+
+- **External**: `numpy` (declared in .pxd, not used in base)
+- **Internal**: none (leaf module)
+
+## Consumers
+
+- `onnx_engine` — subclass
+- `tensorrt_engine` — subclass
+- `inference` — type reference in .pxd
+
+## Data Models
+
+None.
+
+## Configuration
+
+None.
+
+## External Integrations
+
+None.
+
+## Security
+
+None.
+
+## Tests
+
+None found.
@@ -0,0 +1,61 @@
+# Module: loader_http_client
+
+## Purpose
+
+HTTP client for downloading and uploading model files (and other binary resources) via an external Loader microservice.
+
+## Public Interface
+
+### Class: LoadResult
+
+Simple result wrapper.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `err` | str or None | Error message if operation failed |
+| `data` | bytes or None | Response payload on success |
+
+### Class: LoaderHttpClient
+
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `__init__` | `(str base_url)` | Stores base URL, strips trailing slash |
+| `load_big_small_resource` | `(str filename, str directory) -> LoadResult` | POST to `/load/{filename}` with JSON body `{filename, folder}`, returns raw bytes |
+| `upload_big_small_resource` | `(bytes content, str filename, str directory) -> LoadResult` | POST to `/upload/{filename}` with multipart file + form data `{folder}` |
+| `stop` | `() -> None` | No-op placeholder |
+
+## Internal Logic
+
+Both load/upload methods wrap all exceptions into `LoadResult(err=str(e))`. Errors are logged via loguru but never raised.
+
+## Dependencies
+
+- **External**: `requests`, `loguru`
+- **Internal**: none (leaf module)
+
+## Consumers
+
+- `inference` — downloads ONNX/TensorRT models, uploads converted TensorRT engines
+- `main` — instantiates client with `LOADER_URL`
+
+## Data Models
+
+- `LoadResult` — operation result with error-or-data semantics
+
+## Configuration
+
+- `base_url` — provided at construction time, sourced from `LOADER_URL` environment variable in `main.py`
+
+## External Integrations
+
+| Integration | Protocol | Endpoint Pattern |
+|-------------|----------|-----------------|
+| Loader service | HTTP POST | `/load/{filename}` (download), `/upload/{filename}` (upload) |
+
+## Security
+
+None (no auth headers sent to loader).
+
+## Tests
+
+None found.
@@ -0,0 +1,115 @@
+# Module: main
+
+## Purpose
+
+FastAPI application entry point — exposes HTTP API for object detection on images and video media, health checks, and Server-Sent Events (SSE) streaming of detection results.
+
+## Public Interface
+
+### API Endpoints
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/health` | Returns AI engine availability status |
+| POST | `/detect` | Single image detection (multipart file upload) |
+| POST | `/detect/{media_id}` | Start async detection on media from loader service |
+| GET | `/detect/stream` | SSE stream of detection events |
+
+### DTOs (Pydantic Models)
+
+| Model | Fields | Description |
+|-------|--------|-------------|
+| `DetectionDto` | centerX, centerY, width, height, classNum, label, confidence | Single detection result |
+| `DetectionEvent` | annotations (list[DetectionDto]), mediaId, mediaStatus, mediaPercent | SSE event payload |
+| `HealthResponse` | status, aiAvailability, errorMessage | Health check response |
+| `AIConfigDto` | frame_period_recognition, frame_recognition_seconds, probability_threshold, tracking_*, model_batch_size, big_image_tile_overlap_percent, altitude, focal_length, sensor_width, paths | Configuration input for media detection |
+
+### Class: TokenManager
+
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `__init__` | `(str access_token, str refresh_token)` | Stores tokens |
+| `get_valid_token` | `() -> str` | Returns access_token; auto-refreshes if expiring within 60s |
+
+## Internal Logic
+
+### `/health`
+
+Returns `HealthResponse` with `status="healthy"` always. `aiAvailability` reflects the engine's `AIAvailabilityStatus`. On exception, returns `aiAvailability="None"`.
+
+### `/detect` (single image)
+
+1. Reads uploaded file bytes
+2. Parses optional JSON config
+3. Runs `inference.detect_single_image` in ThreadPoolExecutor (max 2 workers)
+4. Returns list of DetectionDto
+
+Error mapping: RuntimeError("not available") → 503, RuntimeError → 422, ValueError → 400.
+
+### `/detect/{media_id}` (async media)
+
+1. Checks for duplicate active detection (409 if already running)
+2. Extracts auth tokens from Authorization header and x-refresh-token header
+3. Creates `asyncio.Task` for background detection
+4. Detection runs `inference.run_detect` in ThreadPoolExecutor
+5. Callbacks push `DetectionEvent` to all SSE queues
+6. If auth token present, also POSTs annotations to the Annotations service
+7. Returns immediately: `{"status": "started", "mediaId": media_id}`
+
+### `/detect/stream` (SSE)
+
+- Creates asyncio.Queue per client (maxsize=100)
+- Yields `data: {json}\n\n` SSE format
+- Cleans up queue on disconnect
+
+### Token Management
+
+- Decodes JWT exp claim from base64 payload (no signature verification)
+- Auto-refreshes via POST to `{ANNOTATIONS_URL}/auth/refresh` when within 60s of expiry
+
+### Annotations Service Integration
+
+- POST to `{ANNOTATIONS_URL}/annotations` with:
+  - `mediaId`, `source: 0`, `videoTime` (formatted from ms), `detections` (list of dto dicts)
+  - Optional base64-encoded `image`
+  - Bearer token in Authorization header
+
+## Dependencies
+
+- **External**: `asyncio`, `base64`, `json`, `os`, `time`, `concurrent.futures`, `typing`, `requests`, `fastapi`, `pydantic`
+- **Internal**: `inference` (lazy import), `constants_inf` (label lookup), `loader_http_client` (client instantiation)
+
+## Consumers
+
+None (entry point).
+
+## Data Models
+
+- `DetectionDto`, `DetectionEvent`, `HealthResponse`, `AIConfigDto` — Pydantic models for API
+- `TokenManager` — JWT token lifecycle
+
+## Configuration
+
+| Env Var | Default | Description |
+|---------|---------|-------------|
+| `LOADER_URL` | `http://loader:8080` | Loader service base URL |
+| `ANNOTATIONS_URL` | `http://annotations:8080` | Annotations service base URL |
+
+## External Integrations
+
+| Service | Protocol | Purpose |
+|---------|----------|---------|
+| Loader | HTTP (via LoaderHttpClient) | Model loading |
+| Annotations | HTTP POST | Auth refresh (`/auth/refresh`), annotation posting (`/annotations`) |
+
+## Security
+
+- Bearer token from request headers, refreshed via Annotations service
+- JWT exp decoded (base64, no signature verification) — token validation is not performed locally
+- No CORS configuration
+- No rate limiting
+- No input validation on media_id path parameter beyond string type
+
+## Tests
+
+None found.
@@ -0,0 +1,51 @@
+# Module: onnx_engine
+
+## Purpose
+
+ONNX Runtime-based inference engine — CPU/CUDA fallback when TensorRT is unavailable.
+
+## Public Interface
+
+### Class: OnnxEngine (extends InferenceEngine)
+
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `__init__` | `(bytes model_bytes, int batch_size=1, **kwargs)` | Loads ONNX model from bytes, creates InferenceSession with CUDA > CPU provider priority. Reads input shape and batch size from model metadata. |
+| `get_input_shape` | `() -> tuple` | Returns `(height, width)` from input tensor shape |
+| `get_batch_size` | `() -> int` | Returns batch size (from model if not dynamic, else from constructor) |
+| `run` | `(input_data) -> list` | Runs session inference, returns output tensors |
+
+## Internal Logic
+
+- Provider order: `["CUDAExecutionProvider", "CPUExecutionProvider"]` — ONNX Runtime selects the best available.
+- If the model's batch dimension is dynamic (-1), uses the constructor's `batch_size` parameter.
+- Logs model input metadata and custom metadata map at init.
+
+## Dependencies
+
+- **External**: `onnxruntime`
+- **Internal**: `inference_engine` (base class), `constants_inf` (logging)
+
+## Consumers
+
+- `inference` — instantiated when no compatible NVIDIA GPU is found
+
+## Data Models
+
+None (wraps onnxruntime.InferenceSession).
+
+## Configuration
+
+None.
+
+## External Integrations
+
+None directly — model bytes are provided by caller (loaded via `loader_http_client`).
+
+## Security
+
+None.
+
+## Tests
+
+None found.
@@ -0,0 +1,57 @@
+# Module: tensorrt_engine
+
+## Purpose
+
+TensorRT-based inference engine — high-performance GPU inference with CUDA memory management and ONNX-to-TensorRT model conversion.
+
+## Public Interface
+
+### Class: TensorRTEngine (extends InferenceEngine)
+
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `__init__` | `(bytes model_bytes, int batch_size=4, **kwargs)` | Deserializes TensorRT engine from bytes, allocates CUDA input/output memory, creates execution context and stream |
+| `get_input_shape` | `() -> tuple` | Returns `(height, width)` from input tensor shape |
+| `get_batch_size` | `() -> int` | Returns batch size |
+| `run` | `(input_data) -> list` | Async H2D copy → execute → D2H copy, returns output as numpy array |
+| `get_gpu_memory_bytes` | `(int device_id) -> int` | Static. Returns total GPU memory in bytes (default 2GB if unavailable) |
+| `get_engine_filename` | `(int device_id) -> str` | Static. Returns engine filename with compute capability and SM count: `azaion.cc_{major}.{minor}_sm_{count}.engine` |
+| `convert_from_onnx` | `(bytes onnx_model) -> bytes or None` | Static. Converts ONNX model to TensorRT serialized engine. Uses 90% of GPU memory as workspace. Enables FP16 if supported. |
+
+## Internal Logic
+
+- Input shape defaults to 1280×1280 for dynamic dimensions.
+- Output shape defaults to 300 max detections × 6 values (x1, y1, x2, y2, conf, cls) for dynamic dimensions.
+- `run` uses async CUDA memory transfers with stream synchronization.
+- `convert_from_onnx` uses explicit batch mode, configures FP16 precision when GPU supports it.
+- Default batch size is 4 (vs OnnxEngine's 1).
+
+## Dependencies
+
+- **External**: `tensorrt`, `pycuda.driver`, `pycuda.autoinit`, `pynvml`, `numpy`
+- **Internal**: `inference_engine` (base class), `constants_inf` (logging)
+
+## Consumers
+
+- `inference` — instantiated when compatible NVIDIA GPU is found; also calls `convert_from_onnx` and `get_engine_filename`
+
+## Data Models
+
+None (wraps TensorRT runtime objects).
+
+## Configuration
+
+- Engine filename is GPU-specific (compute capability + SM count).
+- Workspace memory is 90% of available GPU memory.
+
+## External Integrations
+
+None directly — model bytes provided by caller.
+
+## Security
+
+None.
+
+## Tests
+
+None found.