Add detailed file index and enhance skill documentation for autopilot, decompose, deploy, plan, and research skills. Introduce tests-only mode in decompose skill, clarify required files for deploy and plan skills, and improve prerequisite checks across skills for better user guidance and workflow efficiency.

2026-06-21 05:51:08 +00:00 · 2026-03-22 16:15:49 +02:00
parent 60ebe686ff
commit 3165a88f0b
60 changed files with 6324 additions and 1550 deletions
@@ -0,0 +1,49 @@
+# Acceptance Criteria
+
+## Detection Accuracy
+
+- Detections with confidence below `probability_threshold` (default: 0.25) are filtered out.
+- Overlapping detections with containment ratio > `tracking_intersection_threshold` (default: 0.6) are deduplicated, keeping the higher-confidence detection.
+- Tile duplicate detections are identified when all bounding box coordinates differ by less than 0.01 (TILE_DUPLICATE_CONFIDENCE_THRESHOLD).
+- Physical size filtering: detections exceeding `max_object_size_meters` for their class (defined in classes.json, range 2–20 meters) are removed.
+
+## Video Processing
+
+- Frame sampling: every Nth frame processed, controlled by `frame_period_recognition` (default: 4).
+- Minimum annotation interval: `frame_recognition_seconds` (default: 2 seconds) between reported annotations.
+- Tracking: new annotation accepted if any detection moved beyond `tracking_distance_confidence` threshold or confidence increased beyond `tracking_probability_increase`.
+
+## Image Processing
+
+- Images ≤ 1.5× model dimensions (1280×1280): processed as single frame.
+- Larger images: tiled based on ground sampling distance. Tile physical size: 25 meters (METERS_IN_TILE). Tile overlap: `big_image_tile_overlap_percent` (default: 20%).
+- GSD calculation: `sensor_width * altitude / (focal_length * image_width)`.
+
+## API
+
+- `GET /health` always returns `status: "healthy"` (even if engine is unavailable — aiAvailability indicates actual state).
+- `POST /detect` returns detection results synchronously. Errors: 400 (empty/invalid image), 422 (runtime error), 503 (engine unavailable).
+- `POST /detect/{media_id}` returns immediately with `{"status": "started"}`. Rejects duplicate media_id with 409.
+- `GET /detect/stream` delivers SSE events with `mediaStatus` values: AIProcessing, AIProcessed, Error.
+- SSE queue maximum depth: 100 events per client. Overflow is silently dropped.
+
+## Engine Lifecycle
+
+- Engine initialization is lazy (first detection request, not startup).
+- Status transitions: NONE → DOWNLOADING → (CONVERTING → UPLOADING →) ENABLED | WARNING | ERROR.
+- GPU check: NVIDIA GPU with compute capability ≥ 6.1.
+- TensorRT conversion uses FP16 precision when GPU supports fast FP16.
+- Background conversion does not block API responsiveness.
+
+## Logging
+
+- Log files: `Logs/log_inference_YYYYMMDD.txt`.
+- Rotation: daily.
+- Retention: 30 days.
+- Console: INFO/DEBUG/SUCCESS to stdout, WARNING+ to stderr.
+
+## Object Classes
+
+- 19 base detection classes defined in `classes.json`.
+- 3 weather modes (Norm, Wint, Night) — total up to 57 class variants.
+- Each class has: Id, Name, Color, MaxSizeM (max physical size in meters).
@@ -0,0 +1,76 @@
+# Input Data Parameters
+
+## Media Input
+
+### Single Image Detection (POST /detect)
+
+| Parameter | Type | Source | Description |
+|-----------|------|--------|-------------|
+| file | bytes (multipart) | Client upload | Image file (JPEG, PNG, etc. — any format OpenCV can decode) |
+| config | JSON string (optional) | Query/form field | AIConfigDto overrides |
+
+### Media Detection (POST /detect/{media_id})
+
+| Parameter | Type | Source | Description |
+|-----------|------|--------|-------------|
+| media_id | string | URL path | Identifier for media in the Loader service |
+| AIConfigDto body | JSON (optional) | Request body | Configuration overrides |
+| Authorization header | Bearer token | HTTP header | JWT for Annotations service |
+| x-refresh-token header | string | HTTP header | Refresh token for JWT renewal |
+
+Media files (images and videos) are resolved by the Inference pipeline via paths in the config. The Loader service provides model files, not media files directly.
+
+## Configuration Input (AIConfigDto / AIRecognitionConfig)
+
+| Field | Type | Default | Range/Meaning |
+|-------|------|---------|---------------|
+| frame_period_recognition | int | 4 | Process every Nth video frame |
+| frame_recognition_seconds | int | 2 | Minimum seconds between video annotations |
+| probability_threshold | float | 0.25 | Minimum detection confidence (0..1) |
+| tracking_distance_confidence | float | 0.0 | Movement threshold for tracking (model-width fraction) |
+| tracking_probability_increase | float | 0.0 | Confidence increase threshold for tracking |
+| tracking_intersection_threshold | float | 0.6 | Overlap ratio for NMS deduplication |
+| model_batch_size | int | 1 | Inference batch size |
+| big_image_tile_overlap_percent | int | 20 | Tile overlap for large images (0-100%) |
+| altitude | float | 400 | Camera altitude in meters |
+| focal_length | float | 24 | Camera focal length in mm |
+| sensor_width | float | 23.5 | Camera sensor width in mm |
+| paths | list[str] | [] | Media file paths to process |
+
+## Model Files
+
+| File | Format | Source | Description |
+|------|--------|--------|-------------|
+| azaion.onnx | ONNX | Loader service | Base detection model |
+| azaion.cc_{M}.{m}_sm_{N}.engine | TensorRT | Loader service (cached) | GPU-specific compiled engine |
+
+## Static Data
+
+### classes.json
+
+Array of 19 objects, each with:
+
+| Field | Type | Example | Description |
+|-------|------|---------|-------------|
+| Id | int | 0 | Class identifier |
+| Name | string | "ArmorVehicle" | English class name |
+| ShortName | string | "Броня" | Ukrainian short name |
+| Color | string | "#ff0000" | Hex color for visualization |
+| MaxSizeM | int | 8 | Maximum physical object size in meters |
+
+## Data Volumes
+
+- Single image: up to tens of megapixels (aerial imagery). Large images are tiled.
+- Video: processed frame-by-frame with configurable sampling rate.
+- Model file: ONNX model size depends on architecture (typically 10-100 MB). TensorRT engines are GPU-specific compiled versions.
+- Detection output: up to 300 detections per frame (model limit).
+
+## Data Formats
+
+| Data | Format | Serialization |
+|------|--------|---------------|
+| API requests | HTTP multipart / JSON | Pydantic validation |
+| API responses | JSON | Pydantic model_dump |
+| SSE events | text/event-stream | JSON per event |
+| Internal config | Python dict | AIRecognitionConfig.from_dict() |
+| Legacy (unused) | msgpack | serialize() / from_msgpack() |
@@ -0,0 +1,28 @@
+# Problem Statement
+
+## What is this system?
+
+Azaion.Detections is an AI-powered object detection microservice designed for aerial reconnaissance. It processes drone and satellite imagery (both still images and video) to automatically identify and locate military and infrastructure objects — including armored vehicles, trucks, artillery, trenches, personnel, camouflage nets, buildings, and more.
+
+## What problem does it solve?
+
+Manual analysis of aerial imagery is slow, error-prone, and does not scale. When monitoring large areas from drones or satellites, a human analyst cannot review every frame in real time. This service automates the detection process: given an image or video feed, it returns structured bounding boxes with object classifications and confidence scores, enabling rapid situational awareness.
+
+## Who are the users?
+
+- **Client applications** that submit media for analysis (via HTTP API)
+- **Downstream services** (Annotations service) that store and present detection results
+- **Real-time consumers** that subscribe to Server-Sent Events for live detection updates during video processing
+
+## How does it work at a high level?
+
+1. A client sends an image or triggers detection on media files available in the Loader service
+2. The service preprocesses frames — resizing, normalizing, and for large aerial images, splitting into GSD-based tiles to preserve small object detail
+3. Frames are batched and run through a YOLO-based object detection model via TensorRT (GPU) or ONNX Runtime (CPU fallback)
+4. Raw model output is postprocessed: coordinate normalization, confidence thresholding, overlapping detection removal, physical size filtering, and tile deduplication
+5. Results are returned as structured DTOs (bounding box center, dimensions, class label, confidence)
+6. For video/batch processing, results are streamed in real-time via SSE and optionally posted to an external Annotations service
+
+## Domain context
+
+The system operates in a military/defense aerial reconnaissance context. The 19 object classes (ArmorVehicle, Truck, Vehicle, Artillery, Shadow, Trenches, MilitaryMan, TyreTracks, AdditArmoredTank, Smoke, Plane, Moto, CamouflageNet, CamouflageBranches, Roof, Building, Caponier, Ammo, Protect.Struct) reflect objects of interest in ground surveillance. Three weather modes (Normal, Winter, Night) provide environment-specific detection variants. Physical size filtering using ground sampling distance ensures detections are physically plausible given camera altitude and optics.
@@ -0,0 +1,33 @@
+# Restrictions
+
+## Hardware
+
+- **GPU**: NVIDIA GPU with compute capability ≥ 6.1 required for TensorRT acceleration. Without a compatible GPU, the system falls back to ONNX Runtime (CPU or CUDA provider).
+- **GPU memory**: TensorRT model conversion uses 90% of available GPU memory as workspace. Minimum ~2 GB GPU memory assumed (default fallback value).
+- **Concurrency**: ThreadPoolExecutor limited to 2 workers — maximum 2 concurrent inference operations.
+
+## Software
+
+- **Python 3** with Cython 3.1.3 compilation required (setup.py build step).
+- **ONNX model**: `azaion.onnx` must be available via the Loader service.
+- **TensorRT engine files** are GPU-architecture-specific (filename encodes compute capability and SM count) — not portable across different GPU models.
+- **OpenCV 4.10.0** for image/video decoding and preprocessing.
+- **classes.json** must exist in the working directory at startup — no fallback if missing.
+- **Model input**: fixed 1280×1280 default for dynamic dimensions (hardcoded in TensorRT engine).
+- **Model output**: maximum 300 detections per frame, 6 values per detection (x1, y1, x2, y2, confidence, class_id).
+
+## Environment
+
+- **LOADER_URL** environment variable (default: `http://loader:8080`) — Loader service must be reachable for model download/upload.
+- **ANNOTATIONS_URL** environment variable (default: `http://annotations:8080`) — Annotations service must be reachable for result posting and token refresh.
+- **Logging directory**: `Logs/` directory must be writable for loguru file output.
+- **No local model storage**: models are downloaded on demand from the Loader service; converted TensorRT engines are uploaded back for caching.
+
+## Operational
+
+- **No persistent storage**: the service is stateless regarding detection results — all results are returned via HTTP/SSE or forwarded to the Annotations service.
+- **No TLS at application level**: encryption in transit is expected to be handled by infrastructure (reverse proxy / service mesh).
+- **No CORS configuration**: cross-origin requests are not explicitly handled.
+- **No rate limiting**: the service has no built-in throttling.
+- **No graceful shutdown**: in-progress detections are not drained on shutdown; background TensorRT conversion runs in a daemon thread.
+- **Single-instance state**: `_active_detections` dict and `_event_queues` list are in-memory — not shared across instances or persistent across restarts.
@@ -0,0 +1,70 @@
+# Azaion.Detections — Solution
+
+## 1. Product Solution Description
+
+Azaion.Detections is a microservice that performs automated object detection on aerial imagery and video. It accepts media via HTTP API, runs inference through ONNX Runtime or TensorRT engines, and returns structured detection results (bounding boxes, class labels, confidence scores). Results are delivered synchronously for single images, or streamed via SSE for batch/video media processing.
+
+```mermaid
+graph LR
+    Client["Client App"] -->|HTTP| API["FastAPI API"]
+    API -->|delegates| INF["Inference Pipeline"]
+    INF -->|runs| ENG["ONNX / TensorRT Engine"]
+    INF -->|downloads models| LDR["Loader Service"]
+    API -->|posts results| ANN["Annotations Service"]
+    API -->|streams| SSE["SSE Clients"]
+```
+
+## 2. Architecture
+
+### Component Architecture
+
+| Component | Modules | Responsibility |
+|-----------|---------|---------------|
+| Domain | constants_inf, ai_config, ai_availability_status, annotation | Shared data models, constants, logging, class registry |
+| Inference Engines | inference_engine, onnx_engine, tensorrt_engine | Pluggable ML backends (Strategy pattern) |
+| Inference Pipeline | inference, loader_http_client | Engine lifecycle, preprocessing, postprocessing, media processing |
+| API | main | HTTP endpoints, SSE streaming, auth token forwarding |
+
+### Solution Assessment
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------|-----|
+| Cython inference pipeline | Python 3, Cython 3.1.3, OpenCV 4.10 | Near-C performance for tight detection loops while retaining Python ecosystem | Build complexity, limited IDE/debug support | Compilation step via setup.py | N/A | Low (open-source) | High — critical for postprocessing throughput |
+| Dual engine strategy (TensorRT + ONNX) | TensorRT 10.11, ONNX Runtime 1.22 | Maximum GPU speed with CPU fallback; auto-conversion and caching | Two code paths; GPU-specific engine files not portable | NVIDIA GPU (CC ≥ 6.1) for TensorRT | N/A | TensorRT free for NVIDIA GPUs | High — balances performance and portability |
+| FastAPI HTTP service | FastAPI, Uvicorn, Pydantic | Async SSE, auto-generated docs, fast development | Sync inference offloaded to ThreadPoolExecutor (2 workers) | Python 3.8+ | Bearer token pass-through | Low (open-source) | High — fits async streaming + sync inference pattern |
+| GSD-based image tiling | OpenCV, NumPy | Preserves small object detail in large aerial images | Complex tile dedup logic; overlap increases compute | Camera metadata (altitude, focal length, sensor width) | N/A | Compute cost scales with image size | High — essential for aerial imagery use case |
+| Lazy engine initialization | pynvml, threading | Fast API startup; background model conversion | First request has high latency; engine may be unavailable | None | N/A | N/A | High — prevents blocking startup on slow model download/conversion |
+
+## 3. Testing Strategy
+
+### Current State
+
+No tests found in the codebase. No test directories, test frameworks, or test runner configurations exist.
+
+### Observed Validation Mechanisms
+
+- Detection confidence threshold filtering (`probability_threshold`)
+- Overlapping detection removal (containment-biased NMS)
+- Physical size filtering via ground sampling distance and max_object_size_meters
+- Tile deduplication via coordinate proximity
+- Video annotation validity heuristics (time gap, movement, confidence)
+- AI availability status tracking with error states
+
+## 4. References
+
+| Artifact | Path | Description |
+|----------|------|-------------|
+| FastAPI application | `main.py` | API endpoints, DTOs, SSE streaming |
+| Inference orchestrator | `inference.pyx` / `.pxd` | Core pipeline logic |
+| Engine interface | `inference_engine.pyx` / `.pxd` | Abstract base class |
+| ONNX engine | `onnx_engine.pyx` | CPU/CUDA inference |
+| TensorRT engine | `tensorrt_engine.pyx` / `.pxd` | GPU inference + conversion |
+| Detection models | `annotation.pyx` / `.pxd` | Detection and Annotation classes |
+| Configuration | `ai_config.pyx` / `.pxd` | AIRecognitionConfig |
+| Status tracking | `ai_availability_status.pyx` / `.pxd` | Engine lifecycle status |
+| Constants & logging | `constants_inf.pyx` / `.pxd` | Constants, class registry, logging |
+| HTTP client | `loader_http_client.py` | Model download/upload |
+| Class definitions | `classes.json` | 19 detection classes with metadata |
+| Build config | `setup.py` | Cython compilation |
+| CPU dependencies | `requirements.txt` | Python package versions |
+| GPU dependencies | `requirements-gpu.txt` | TensorRT, PyCUDA additions |
@@ -0,0 +1,135 @@
+# Codebase Discovery
+
+## Directory Tree
+
+```
+detections/
+├── main.py                    # FastAPI entry point
+├── setup.py                   # Cython build configuration
+├── requirements.txt           # CPU dependencies
+├── requirements-gpu.txt       # GPU dependencies (extends requirements.txt)
+├── classes.json               # Object detection class definitions (19 classes)
+├── .gitignore
+├── inference.pyx / .pxd       # Core inference orchestrator (Cython)
+├── inference_engine.pyx / .pxd # Abstract base engine class (Cython)
+├── onnx_engine.pyx            # ONNX Runtime inference engine (Cython)
+├── tensorrt_engine.pyx / .pxd # TensorRT inference engine (Cython)
+├── annotation.pyx / .pxd      # Detection & Annotation data models (Cython)
+├── ai_config.pyx / .pxd       # AI recognition config (Cython)
+├── ai_availability_status.pyx / .pxd # AI status enum & state (Cython)
+├── constants_inf.pyx / .pxd   # Constants, logging, class registry (Cython)
+└── loader_http_client.py      # HTTP client for model loading/uploading
+```
+
+## Tech Stack Summary
+
+| Aspect | Technology |
+|--------|-----------|
+| Language | Python 3 + Cython |
+| Web Framework | FastAPI + Uvicorn |
+| ML Inference (CPU) | ONNX Runtime 1.22.0 |
+| ML Inference (GPU) | TensorRT 10.11.0 + PyCUDA 2025.1.1 |
+| Image Processing | OpenCV 4.10.0 |
+| Serialization | msgpack 1.1.1 |
+| HTTP Client | requests 2.32.4 |
+| Logging | loguru 0.7.3 |
+| GPU Monitoring | pynvml 12.0.0 |
+| Numeric | NumPy 2.3.0 |
+| Build | Cython 3.1.3 + setuptools |
+
+## Dependency Graph
+
+### Internal Module Dependencies
+
+```
+constants_inf       ← (leaf) no internal deps
+ai_config           ← (leaf) no internal deps
+inference_engine    ← (leaf) no internal deps
+loader_http_client  ← (leaf) no internal deps
+
+ai_availability_status → constants_inf
+annotation             → constants_inf
+
+onnx_engine     → inference_engine, constants_inf
+tensorrt_engine → inference_engine, constants_inf
+
+inference → constants_inf, ai_availability_status, annotation, ai_config,
+            onnx_engine | tensorrt_engine (conditional on GPU availability)
+
+main → inference, constants_inf, loader_http_client
+```
+
+### Mermaid Diagram
+
+```mermaid
+graph TD
+    main["main.py (FastAPI)"]
+    inference["inference"]
+    onnx_engine["onnx_engine"]
+    tensorrt_engine["tensorrt_engine"]
+    inference_engine["inference_engine (abstract)"]
+    annotation["annotation"]
+    ai_availability_status["ai_availability_status"]
+    ai_config["ai_config"]
+    constants_inf["constants_inf"]
+    loader_http_client["loader_http_client"]
+
+    main --> inference
+    main --> constants_inf
+    main --> loader_http_client
+
+    inference --> constants_inf
+    inference --> ai_availability_status
+    inference --> annotation
+    inference --> ai_config
+    inference -.->|GPU available| tensorrt_engine
+    inference -.->|CPU fallback| onnx_engine
+
+    onnx_engine --> inference_engine
+    onnx_engine --> constants_inf
+
+    tensorrt_engine --> inference_engine
+    tensorrt_engine --> constants_inf
+
+    ai_availability_status --> constants_inf
+    annotation --> constants_inf
+```
+
+## Topological Processing Order
+
+1. `constants_inf` (leaf)
+2. `ai_config` (leaf)
+3. `inference_engine` (leaf)
+4. `loader_http_client` (leaf)
+5. `ai_availability_status` (depends: constants_inf)
+6. `annotation` (depends: constants_inf)
+7. `onnx_engine` (depends: inference_engine, constants_inf)
+8. `tensorrt_engine` (depends: inference_engine, constants_inf)
+9. `inference` (depends: constants_inf, ai_availability_status, annotation, ai_config, onnx_engine/tensorrt_engine)
+10. `main` (depends: inference, constants_inf, loader_http_client)
+
+## Entry Points
+
+- `main.py` — FastAPI application, serves HTTP API on uvicorn
+
+## Leaf Modules
+
+- `constants_inf` — constants, logging, class registry
+- `ai_config` — recognition configuration data class
+- `inference_engine` — abstract base class for engines
+- `loader_http_client` — HTTP client for external loader service
+
+## Cycles
+
+None detected.
+
+## External Services
+
+| Service | URL Source | Purpose |
+|---------|-----------|---------|
+| Loader | `LOADER_URL` env var (default `http://loader:8080`) | Download/upload AI models |
+| Annotations | `ANNOTATIONS_URL` env var (default `http://annotations:8080`) | Post detection results, refresh auth tokens |
+
+## Data Files
+
+- `classes.json` — 19 object detection classes with Ukrainian short names, colors, and max physical size in meters (ArmorVehicle, Truck, Vehicle, Artillery, Shadow, Trenches, MilitaryMan, TyreTracks, etc.)
@@ -0,0 +1,98 @@
+# Verification Log
+
+## Summary
+
+| Metric | Count |
+|--------|-------|
+| Total entities verified | 82 |
+| Entities confirmed correct | 78 |
+| Issues found | 4 |
+| Corrections applied | 4 |
+| Remaining gaps | 0 |
+| Completeness score | 10/10 modules covered |
+
+## Entity Verification
+
+### Classes & Functions — All Verified
+
+| Entity | Module | Status |
+|--------|--------|--------|
+| AnnotationClass | constants_inf | Confirmed |
+| WeatherMode enum (Norm/Wint/Night) | constants_inf | Confirmed |
+| log(), logerror(), format_time() | constants_inf | Confirmed |
+| annotations_dict | constants_inf | Confirmed |
+| AIRecognitionConfig | ai_config | Confirmed |
+| from_dict(), from_msgpack() | ai_config | Confirmed |
+| AIAvailabilityEnum | ai_availability_status | Confirmed |
+| AIAvailabilityStatus | ai_availability_status | Confirmed |
+| Detection, Annotation | annotation | Confirmed |
+| InferenceEngine | inference_engine | Confirmed |
+| OnnxEngine | onnx_engine | Confirmed |
+| TensorRTEngine | tensorrt_engine | Confirmed |
+| convert_from_onnx, get_engine_filename, get_gpu_memory_bytes | tensorrt_engine | Confirmed |
+| Inference | inference | Confirmed |
+| LoaderHttpClient, LoadResult | loader_http_client | Confirmed |
+| DetectionDto, DetectionEvent, HealthResponse, AIConfigDto | main | Confirmed |
+| TokenManager | main | Confirmed |
+| detection_to_dto | main | Confirmed |
+
+### API Endpoints — All Verified
+
+| Endpoint | Method | Status |
+|----------|--------|--------|
+| /health | GET | Confirmed |
+| /detect | POST | Confirmed |
+| /detect/{media_id} | POST | Confirmed |
+| /detect/stream | GET | Confirmed |
+
+### Constants — All Verified
+
+All 10 constants in constants_inf module verified against code values.
+
+## Issues Found & Corrections Applied
+
+### Issue 1: Legacy PXD Declarations (constants_inf)
+
+**Location**: `constants_inf.pxd` lines 3-5
+
+**Finding**: The `.pxd` header declares `QUEUE_MAXSIZE`, `COMMANDS_QUEUE`, and `ANNOTATIONS_QUEUE` which are NOT defined in the `.pyx` implementation. Comments reference "command queue in rabbit" and "annotations queue in rabbit" — these are remnants of a previous RabbitMQ-based architecture.
+
+**Correction**: Added note to `modules/constants_inf.md` documenting these as orphaned legacy declarations.
+
+### Issue 2: Unused serialize() Methods
+
+**Location**: `annotation.pyx` (Annotation.serialize, Detection — via annotation), `ai_availability_status.pyx` (AIAvailabilityStatus.serialize)
+
+**Finding**: Both `serialize()` methods are defined but never called anywhere in the codebase. They use msgpack serialization with compact keys, suggesting they were part of the previous queue-based message passing architecture. The current HTTP API uses Pydantic JSON serialization instead.
+
+**Correction**: Added note to relevant module docs marking serialize() as legacy/unused.
+
+### Issue 3: Unused from_msgpack() Factory Method
+
+**Location**: `ai_config.pyx` line 55
+
+**Finding**: `AIRecognitionConfig.from_msgpack()` is defined but never called. Only `from_dict()` is used (called from `inference.pyx`). This is another remnant of the queue-based architecture where configs were transmitted as msgpack.
+
+**Correction**: Added note to `modules/ai_config.md`.
+
+### Issue 4: Unused file_data Field
+
+**Location**: `ai_config.pyx` line 31
+
+**Finding**: `AIRecognitionConfig.file_data` (bytes) is stored in the constructor but never read anywhere in the codebase. It's populated from both `from_dict` and `from_msgpack` but has no consumer.
+
+**Correction**: Added note to `modules/ai_config.md`.
+
+## Cross-Document Consistency
+
+| Check | Result |
+|-------|--------|
+| Component docs match architecture doc | Consistent |
+| Flow diagrams match component interfaces | Consistent |
+| Data model matches module docs | Consistent |
+| Dependency graph in discovery matches component diagram | Consistent |
+| Constants values in docs match code | Confirmed |
+
+## Infrastructure Observation
+
+No Dockerfile, docker-compose.yml, or CI/CD configuration found in the repository. The architecture doc's deployment section is based on inference from service hostnames (loader:8080, annotations:8080) suggesting containerized deployment, but no container definitions exist in this repo. They likely reside in a parent or infrastructure repository.
@@ -0,0 +1,86 @@
+# Azaion.Detections — Documentation Report
+
+## Executive Summary
+
+Azaion.Detections is a Python/Cython microservice for automated aerial object detection. It exposes a FastAPI HTTP API that accepts images and video, runs YOLO-based inference through TensorRT (GPU) or ONNX Runtime (CPU fallback), and returns structured detection results. The system supports large aerial image tiling with ground sampling distance-based sizing, real-time video processing with frame sampling and tracking heuristics, and Server-Sent Events streaming for live detection updates.
+
+The codebase consists of 10 modules (2 Python, 8 Cython) organized into 4 components. It integrates with two external services: a Loader service for model storage and an Annotations service for result persistence. The system has no tests, no containerization config in this repo, and several legacy artifacts from a prior RabbitMQ-based architecture.
+
+## Problem Statement
+
+Automated detection of military and infrastructure objects (19 classes including vehicles, artillery, trenches, personnel, camouflage) from aerial imagery and video feeds. Replaces manual analyst review with real-time AI-powered detection, enabling rapid situational awareness for reconnaissance operations.
+
+## Architecture Overview
+
+**Tech stack**: Python 3 + Cython 3.1.3 | FastAPI + Uvicorn | ONNX Runtime 1.22.0 | TensorRT 10.11.0 | OpenCV 4.10.0 | NumPy 2.3.0
+
+**Key architectural decisions**:
+1. Cython for performance-critical inference loops
+2. Dual engine strategy (TensorRT + ONNX fallback) with automatic conversion and caching
+3. Lazy engine initialization for fast API startup
+4. GSD-based image tiling for large aerial images
+
+## Component Summary
+
+| # | Component | Modules | Purpose | Dependencies |
+|---|-----------|---------|---------|-------------|
+| 01 | Domain | constants_inf, ai_config, ai_availability_status, annotation | Shared data models, enums, constants, logging, class registry | None (foundation) |
+| 02 | Inference Engines | inference_engine, onnx_engine, tensorrt_engine | Pluggable ML inference backends (Strategy pattern) | Domain |
+| 03 | Inference Pipeline | inference, loader_http_client | Engine lifecycle, media preprocessing/postprocessing, model loading | Domain, Engines |
+| 04 | API | main | HTTP endpoints, SSE streaming, auth token management | Domain, Pipeline |
+
+## System Flows
+
+| # | Flow | Trigger | Description |
+|---|------|---------|-------------|
+| F1 | Health Check | GET /health | Returns AI engine availability status |
+| F2 | Single Image Detection | POST /detect | Synchronous image inference, returns detections |
+| F3 | Media Detection (Async) | POST /detect/{media_id} | Background processing with SSE streaming + Annotations posting |
+| F4 | SSE Streaming | GET /detect/stream | Real-time event delivery to connected clients |
+| F5 | Engine Initialization | First detection request | TensorRT → ONNX fallback → background conversion |
+| F6 | TensorRT Conversion | No cached engine | Background ONNX→TensorRT conversion and upload |
+
+## Risk Observations
+
+| Risk | Severity | Source |
+|------|----------|--------|
+| No tests in the codebase | High | Verification (Step 4) |
+| No CORS, rate limiting, or request size limits | Medium | Security review (main.py) |
+| JWT token handled without signature verification | Medium | Security review (main.py) |
+| Legacy unused code (serialize, from_msgpack, queue declarations) | Low | Verification (Step 4) |
+| No graceful shutdown for in-progress detections | Medium | Architecture review |
+| Single-instance in-memory state (_active_detections, _event_queues) | Medium | Scalability review |
+| No Dockerfile or CI/CD config in this repository | Low | Infrastructure review |
+| classes.json must exist at startup — no fallback | Low | Reliability review |
+| Hardcoded 1280×1280 default for dynamic TensorRT dimensions | Low | Flexibility review |
+
+## Open Questions
+
+1. Where is the Dockerfile / docker-compose.yml for this service? Likely in a separate infrastructure repository.
+2. Is the legacy RabbitMQ code (serialize methods, from_msgpack, queue constants in .pxd) planned for removal?
+3. What is the intended scaling model — single instance per GPU, or horizontal scaling with shared state?
+4. Should JWT signature verification be added at the detection service level, or is the current pass-through approach intentional?
+5. Are there integration or end-to-end tests in a separate repository?
+
+## Artifact Index
+
+| Path | Description |
+|------|-------------|
+| `_docs/00_problem/problem.md` | Problem statement |
+| `_docs/00_problem/restrictions.md` | System restrictions and constraints |
+| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria |
+| `_docs/00_problem/input_data/data_parameters.md` | Input data schemas and parameters |
+| `_docs/01_solution/solution.md` | Solution description and assessment |
+| `_docs/02_document/00_discovery.md` | Codebase discovery (tech stack, dependency graph) |
+| `_docs/02_document/modules/*.md` | Per-module documentation (10 modules) |
+| `_docs/02_document/components/01_domain/description.md` | Domain component spec |
+| `_docs/02_document/components/02_inference_engines/description.md` | Inference Engines component spec |
+| `_docs/02_document/components/03_inference_pipeline/description.md` | Inference Pipeline component spec |
+| `_docs/02_document/components/04_api/description.md` | API component spec |
+| `_docs/02_document/diagrams/components.md` | Component relationship diagram |
+| `_docs/02_document/architecture.md` | System architecture document |
+| `_docs/02_document/system-flows.md` | System flow diagrams and descriptions |
+| `_docs/02_document/data_model.md` | Data model with ERD |
+| `_docs/02_document/04_verification_log.md` | Verification pass results |
+| `_docs/02_document/FINAL_report.md` | This report |
+| `_docs/02_document/state.json` | Documentation process state |
@@ -0,0 +1,151 @@
+# Azaion.Detections — Architecture
+
+## 1. System Context
+
+**Problem being solved**: Automated object detection on aerial imagery and video — identifying military and infrastructure objects (vehicles, artillery, trenches, personnel, etc.) from drone/satellite feeds and returning structured detection results with bounding boxes, class labels, and confidence scores.
+
+**System boundaries**:
+- **Inside**: FastAPI HTTP service, Cython-based inference pipeline, ONNX/TensorRT inference engines, image tiling, video frame processing, detection postprocessing
+- **Outside**: Loader service (model storage), Annotations service (result persistence + auth), client applications
+
+**External systems**:
+
+| System | Integration Type | Direction | Purpose |
+|--------|-----------------|-----------|---------|
+| Loader Service | REST (HTTP) | Both | Download AI models, upload converted TensorRT engines |
+| Annotations Service | REST (HTTP) | Outbound | Post detection results, refresh auth tokens |
+| Client Applications | REST + SSE | Inbound | Submit detection requests, receive streaming results |
+
+## 2. Technology Stack
+
+| Layer | Technology | Version | Rationale |
+|-------|-----------|---------|-----------|
+| Language | Python 3 + Cython | 3.1.3 (Cython) | Python for API, Cython for performance-critical inference loops |
+| Framework | FastAPI + Uvicorn | latest | Async HTTP + SSE support |
+| ML Runtime (CPU) | ONNX Runtime | 1.22.0 | Portable model format, CPU/CUDA provider fallback |
+| ML Runtime (GPU) | TensorRT + PyCUDA | 10.11.0 / 2025.1.1 | Maximum GPU inference performance |
+| Image Processing | OpenCV | 4.10.0 | Frame decoding, preprocessing, tiling |
+| Serialization | msgpack | 1.1.1 | Compact binary serialization for annotations and configs |
+| HTTP Client | requests | 2.32.4 | Synchronous HTTP to Loader and Annotations services |
+| Logging | loguru | 0.7.3 | Structured file + console logging |
+| GPU Monitoring | pynvml | 12.0.0 | GPU detection, capability checks, memory queries |
+| Numeric | NumPy | 2.3.0 | Tensor manipulation |
+
+## 3. Deployment Model
+
+**Infrastructure**: Containerized microservice, deployed alongside Loader and Annotations services (likely Docker Compose or Kubernetes given service discovery by hostname).
+
+**Environment-specific configuration**:
+
+| Config | Development | Production |
+|--------|-------------|------------|
+| LOADER_URL | `http://loader:8080` (default) | Environment variable |
+| ANNOTATIONS_URL | `http://annotations:8080` (default) | Environment variable |
+| GPU | Optional (falls back to ONNX CPU) | Required (TensorRT) |
+| Logging | Console + file | File (`Logs/log_inference_YYYYMMDD.txt`, 30-day retention) |
+
+## 4. Data Model Overview
+
+**Core entities**:
+
+| Entity | Description | Owned By Component |
+|--------|-------------|--------------------|
+| AnnotationClass | Detection class metadata (name, color, max physical size) | 01 Domain |
+| Detection | Single bounding box with class + confidence | 01 Domain |
+| Annotation | Collection of detections for one frame/tile + image | 01 Domain |
+| AIRecognitionConfig | Runtime inference parameters | 01 Domain |
+| AIAvailabilityStatus | Engine lifecycle state | 01 Domain |
+| DetectionDto | API-facing detection response | 04 API |
+| DetectionEvent | SSE event payload | 04 API |
+
+**Key relationships**:
+- Annotation → Detection: one-to-many (detections within a frame/tile)
+- Detection → AnnotationClass: many-to-one (via class ID lookup in annotations_dict)
+- Annotation → Media: many-to-one (multiple annotations per video/image)
+
+**Data flow summary**:
+- Media bytes → Preprocessing → Engine → Raw output → Postprocessing → Detection/Annotation → DTO → HTTP/SSE response
+- ONNX model bytes → Loader → Engine init (or TensorRT conversion → upload back to Loader)
+
+## 5. Integration Points
+
+### Internal Communication
+
+| From | To | Protocol | Pattern | Notes |
+|------|----|----------|---------|-------|
+| API | Inference Pipeline | Direct Python call | Sync (via ThreadPoolExecutor) | Lazy initialization |
+| Inference Pipeline | Inference Engines | Direct Cython call | Sync | Strategy pattern selection |
+| Inference Pipeline | Loader | HTTP POST | Request-Response | Model download/upload |
+
+### External Integrations
+
+| External System | Protocol | Auth | Rate Limits | Failure Mode |
+|----------------|----------|------|-------------|--------------|
+| Loader Service | HTTP POST | None | None observed | Exception → LoadResult(err) |
+| Annotations Service | HTTP POST | Bearer JWT | None observed | Exception silently caught |
+| Annotations Auth | HTTP POST | Refresh token | None observed | Exception silently caught |
+
+## 6. Non-Functional Requirements
+
+| Requirement | Target | Measurement | Priority |
+|------------|--------|-------------|----------|
+| Concurrent inference | 2 parallel jobs max | ThreadPoolExecutor workers | High |
+| SSE queue depth | 100 events per client | asyncio.Queue maxsize | Medium |
+| Log retention | 30 days | loguru rotation config | Medium |
+| GPU compatibility | Compute capability ≥ 6.1 | pynvml check at startup | High |
+| Model format | ONNX (portable) + TensorRT (GPU-specific) | Engine filename includes CC+SM | High |
+
+## 7. Security Architecture
+
+**Authentication**: Pass-through Bearer JWT from client → forwarded to Annotations service. JWT exp decoded locally (base64, no signature verification) for token refresh timing.
+
+**Authorization**: None at the detection service level. Auth is delegated to the Annotations service.
+
+**Data protection**:
+- At rest: not applicable (no local persistence of detection results)
+- In transit: no TLS configured at application level (expected to be handled by infrastructure/reverse proxy)
+- Secrets management: tokens received per-request, no stored credentials
+
+**Audit logging**: Inference activity logged to daily rotated files. No auth audit logging.
+
+## 8. Key Architectural Decisions
+
+### ADR-001: Cython for Inference Pipeline
+
+**Context**: Detection postprocessing involves tight loops over bounding box coordinates with floating-point math.
+
+**Decision**: Implement the inference pipeline, data models, and engines as Cython `cdef` classes with typed variables.
+
+**Alternatives considered**:
+1. Pure Python — rejected due to loop-heavy postprocessing performance
+2. C/C++ extension — rejected for development velocity; Cython offers C-speed with Python-like syntax
+
+**Consequences**: Build step required (setup.py + Cython compilation). IDE support and debugging more complex.
+
+### ADR-002: Dual Engine Strategy (TensorRT + ONNX Fallback)
+
+**Context**: Need maximum GPU inference speed where available, but must also run on CPU-only machines.
+
+**Decision**: Check GPU at module load time. If compatible NVIDIA GPU found, use TensorRT; otherwise fall back to ONNX Runtime. Background-convert ONNX→TensorRT and cache the engine.
+
+**Alternatives considered**:
+1. TensorRT only — rejected; would break CPU-only development/testing
+2. ONNX only — rejected; significantly slower on GPU vs TensorRT
+
+**Consequences**: Two code paths to maintain. GPU-specific engine files cached per architecture.
+
+### ADR-003: Lazy Inference Initialization
+
+**Context**: Engine initialization is slow (model download, possible conversion). API should start accepting health checks immediately.
+
+**Decision**: `Inference` is created on first actual detection request, not at app startup. Health endpoint works without engine.
+
+**Consequences**: First detection request has higher latency. `AIAvailabilityStatus` reports state transitions during initialization.
+
+### ADR-004: Large Image Tiling with GSD-Based Sizing
+
+**Context**: Aerial images can be much larger than the model's fixed input size (1280×1280). Simple resize would lose small object detail.
+
+**Decision**: Split large images into tiles sized by ground sampling distance (`METERS_IN_TILE / GSD` pixels) with configurable overlap. Deduplicate detections across tile boundaries.
+
+**Consequences**: More complex pipeline. Tile deduplication relies on coordinate proximity threshold.
@@ -0,0 +1,95 @@
+# Component: Domain Models & Configuration
+
+## Overview
+
+**Purpose**: Provides all data models, enums, constants, detection class registry, and logging infrastructure used across the system.
+
+**Pattern**: Shared kernel — leaf-level types and utilities consumed by all other components.
+
+**Upstream**: None (foundation layer).
+**Downstream**: Inference Engines, Inference Pipeline, API.
+
+## Modules
+
+| Module | Role |
+|--------|------|
+| `constants_inf` | Application constants, logging, detection class registry from `classes.json` |
+| `ai_config` | `AIRecognitionConfig` data class with factory methods |
+| `ai_availability_status` | Thread-safe `AIAvailabilityStatus` tracker with `AIAvailabilityEnum` |
+| `annotation` | `Detection` and `Annotation` data models |
+
+## Internal Interfaces
+
+### constants_inf
+
+```
+cdef log(str log_message) -> void
+cdef logerror(str error) -> void
+cdef format_time(int ms) -> str
+annotations_dict: dict[int, AnnotationClass]
+```
+
+### ai_config
+
+```
+cdef class AIRecognitionConfig:
+    @staticmethod cdef from_msgpack(bytes data) -> AIRecognitionConfig
+    @staticmethod def from_dict(dict data) -> AIRecognitionConfig
+```
+
+### ai_availability_status
+
+```
+cdef class AIAvailabilityStatus:
+    cdef set_status(AIAvailabilityEnum status, str error_message=None)
+    cdef bytes serialize()
+    # __str__ for display
+```
+
+### annotation
+
+```
+cdef class Detection:
+    cdef overlaps(Detection det2, float confidence_threshold) -> bool
+    # __eq__ for tile deduplication
+
+cdef class Annotation:
+    cdef bytes serialize()
+```
+
+## External API
+
+None — this is a shared kernel, not an externally-facing component.
+
+## Data Access Patterns
+
+- `classes.json` read once at module import time (constants_inf)
+- All data is in-memory, no database access
+
+## Implementation Details
+
+- Cython `cdef` classes for performance-critical detection processing
+- Thread-safe status tracking via `threading.Lock` in `AIAvailabilityStatus`
+- `Detection.__eq__` uses coordinate proximity threshold for tile deduplication
+- `Detection.overlaps` uses containment-biased metric (overlap / min_area) rather than standard IoU
+- Weather mode system triples the class registry (Norm/Wint/Night offsets of 0/20/40)
+
+## Caveats
+
+- `classes.json` must exist in the working directory at import time — no fallback
+- `Detection.__eq__` is designed specifically for tile deduplication, not general equality
+- `annotations_dict` is a module-level global — not injectable/configurable at runtime
+
+## Dependency Graph
+
+```mermaid
+graph TD
+    ai_availability_status --> constants_inf
+    annotation --> constants_inf
+    ai_config
+    constants_inf
+```
+
+## Logging Strategy
+
+All logging flows through `constants_inf.log` and `constants_inf.logerror`, which delegate to loguru with file rotation and console output.
@@ -0,0 +1,86 @@
+# Component: Inference Engines
+
+## Overview
+
+**Purpose**: Provides pluggable inference backends (ONNX Runtime and TensorRT) behind a common abstract interface, including ONNX-to-TensorRT model conversion.
+
+**Pattern**: Strategy pattern — `InferenceEngine` defines the contract; `OnnxEngine` and `TensorRTEngine` are interchangeable implementations.
+
+**Upstream**: Domain (constants_inf for logging).
+**Downstream**: Inference Pipeline (creates and uses engines).
+
+## Modules
+
+| Module | Role |
+|--------|------|
+| `inference_engine` | Abstract base class defining `get_input_shape`, `get_batch_size`, `run` |
+| `onnx_engine` | ONNX Runtime implementation (CPU/CUDA) |
+| `tensorrt_engine` | TensorRT implementation (GPU) + ONNX→TensorRT converter |
+
+## Internal Interfaces
+
+### InferenceEngine (abstract)
+
+```
+cdef class InferenceEngine:
+    __init__(bytes model_bytes, int batch_size=1, **kwargs)
+    cdef tuple get_input_shape()       # -> (height, width)
+    cdef int get_batch_size()          # -> batch_size
+    cdef run(input_data)               # -> list of output tensors
+```
+
+### OnnxEngine
+
+```
+cdef class OnnxEngine(InferenceEngine):
+    # Implements all base methods
+    # Provider priority: CUDA > CPU
+```
+
+### TensorRTEngine
+
+```
+cdef class TensorRTEngine(InferenceEngine):
+    # Implements all base methods
+    @staticmethod get_gpu_memory_bytes(int device_id) -> int
+    @staticmethod get_engine_filename(int device_id) -> str
+    @staticmethod convert_from_onnx(bytes onnx_model) -> bytes or None
+```
+
+## External API
+
+None — internal component consumed by Inference Pipeline.
+
+## Data Access Patterns
+
+- Model bytes loaded in-memory (provided by caller)
+- TensorRT: CUDA device memory allocated at init, async H2D/D2H transfers during inference
+- ONNX: managed by onnxruntime internally
+
+## Implementation Details
+
+- **OnnxEngine**: default batch_size=1; loads model into `onnxruntime.InferenceSession`
+- **TensorRTEngine**: default batch_size=4; dynamic dimensions default to 1280×1280 input, 300 max detections
+- **Model conversion**: `convert_from_onnx` uses 90% of GPU memory as workspace, enables FP16 if hardware supports it
+- **Engine filename**: GPU-specific (`azaion.cc_{major}.{minor}_sm_{count}.engine`) — allows pre-built engine caching per GPU architecture
+- Output format: `[batch][detection_index][x1, y1, x2, y2, confidence, class_id]`
+
+## Caveats
+
+- TensorRT engine files are GPU-architecture-specific and not portable
+- `pycuda.autoinit` import is required as side-effect (initializes CUDA context)
+- Dynamic shapes defaulting to 1280×1280 is hardcoded — not configurable
+
+## Dependency Graph
+
+```mermaid
+graph TD
+    onnx_engine --> inference_engine
+    onnx_engine --> constants_inf
+    tensorrt_engine --> inference_engine
+    tensorrt_engine --> constants_inf
+```
+
+## Logging Strategy
+
+Logs model metadata at init and conversion progress/errors via `constants_inf.log`/`logerror`.
@@ -0,0 +1,129 @@
+# Component: Inference Pipeline
+
+## Overview
+
+**Purpose**: Orchestrates the full inference lifecycle — engine initialization with fallback strategy, media preprocessing (images + video), batched inference execution, postprocessing with detection filtering, and result delivery via callbacks.
+
+**Pattern**: Façade + Pipeline — `Inference` class is the single entry point that coordinates engine selection, preprocessing, inference, and postprocessing stages.
+
+**Upstream**: Domain (data models, config, status), Inference Engines (OnnxEngine/TensorRTEngine), External Client (LoaderHttpClient).
+**Downstream**: API (creates Inference, calls `run_detect` and `detect_single_image`).
+
+## Modules
+
+| Module | Role |
+|--------|------|
+| `inference` | Core orchestrator: engine lifecycle, preprocessing, postprocessing, image/video processing |
+| `loader_http_client` | HTTP client for model download/upload from Loader service |
+
+## Internal Interfaces
+
+### Inference
+
+```
+cdef class Inference:
+    __init__(loader_client)
+    cpdef run_detect(dict config_dict, annotation_callback, status_callback=None)
+    cpdef list detect_single_image(bytes image_bytes, dict config_dict)
+    cpdef stop()
+
+    # Internal pipeline stages:
+    cdef init_ai()
+    cdef preprocess(frames) -> ndarray
+    cdef postprocess(output, ai_config) -> list[list[Detection]]
+    cdef remove_overlapping_detections(list[Detection], float threshold) -> list[Detection]
+    cdef _process_images(AIRecognitionConfig, list[str] paths)
+    cdef _process_video(AIRecognitionConfig, str video_name)
+```
+
+### LoaderHttpClient
+
+```
+class LoaderHttpClient:
+    load_big_small_resource(str filename, str directory) -> LoadResult
+    upload_big_small_resource(bytes content, str filename, str directory) -> LoadResult
+```
+
+## External API
+
+None — internal component, consumed by API layer.
+
+## Data Access Patterns
+
+- Model bytes downloaded from Loader service (HTTP)
+- Converted TensorRT engines uploaded back to Loader for caching
+- Video frames read via OpenCV VideoCapture
+- Images read via OpenCV imread
+- All processing is in-memory
+
+## Implementation Details
+
+### Engine Initialization Strategy
+
+```
+1. Check GPU availability (pynvml, compute capability ≥ 6.1)
+2. If GPU:
+   a. Try loading pre-built TensorRT engine from Loader
+   b. If fails → download ONNX model → start background conversion thread
+   c. Background thread: convert ONNX→TensorRT → upload to Loader → set _converted_model_bytes
+   d. Next init_ai() call: load from _converted_model_bytes
+3. If no GPU:
+   a. Download ONNX model from Loader → create OnnxEngine
+```
+
+### Preprocessing
+
+- `cv2.dnn.blobFromImage`: normalize 0..1, resize to model input, BGR→RGB
+- Batch via `np.vstack`
+
+### Postprocessing
+
+- Parse `[batch][det][x1,y1,x2,y2,conf,cls]` output
+- Normalize coordinates to 0..1
+- Convert to center-format Detection objects
+- Filter by confidence threshold
+- Remove overlapping detections (greedy: keep higher confidence, tie-break by lower class_id)
+
+### Large Image Tiling
+
+- Ground Sampling Distance: `sensor_width * altitude / (focal_length * image_width)`
+- Tile size: `METERS_IN_TILE / GSD` pixels
+- Overlap: configurable percentage
+- Tile deduplication: absolute-coordinate Detection equality across adjacent tiles
+- Physical size filtering: remove detections exceeding class max_object_size_meters
+
+### Video Processing
+
+- Frame sampling: every Nth frame
+- Annotation validity heuristics: time gap, detection count increase, spatial movement, confidence improvement
+- JPEG encoding of valid frames for annotation images
+
+### Callbacks
+
+- `annotation_callback(annotation, percent)` — called per valid annotation
+- `status_callback(media_name, count)` — called when all detections for a media item are complete
+
+## Caveats
+
+- `ThreadPoolExecutor` with max_workers=2 limits concurrent inference (set in main.py)
+- Background TensorRT conversion runs in a daemon thread — may be interrupted on shutdown
+- `init_ai()` called on every `run_detect` — idempotent but checks engine state each time
+- Video processing is sequential per video (no parallel video processing)
+- `_tile_detections` dict is instance-level state that persists across image calls within a single `run_detect` invocation
+
+## Dependency Graph
+
+```mermaid
+graph TD
+    inference --> constants_inf
+    inference --> ai_availability_status
+    inference --> annotation
+    inference --> ai_config
+    inference -.-> onnx_engine
+    inference -.-> tensorrt_engine
+    inference --> loader_http_client
+```
+
+## Logging Strategy
+
+Extensive logging via `constants_inf.log`: engine init status, media processing start, GSD calculation, tile splitting, detection results, size filtering decisions.
@@ -0,0 +1,103 @@
+# Component: API
+
+## Overview
+
+**Purpose**: HTTP API layer exposing object detection capabilities via FastAPI — handles request/response serialization, async task management, SSE streaming, and authentication token forwarding.
+
+**Pattern**: Controller layer — thin API surface that delegates all business logic to the Inference Pipeline.
+
+**Upstream**: Inference Pipeline (Inference class), Domain (constants_inf for labels).
+**Downstream**: None (top-level, client-facing).
+
+## Modules
+
+| Module | Role |
+|--------|------|
+| `main` | FastAPI app definition, endpoints, DTOs, TokenManager, SSE streaming |
+
+## External API Specification
+
+### GET /health
+
+**Response**: `HealthResponse`
+```json
+{
+  "status": "healthy",
+  "aiAvailability": "Enabled",
+  "errorMessage": null
+}
+```
+`aiAvailability` values: None, Downloading, Converting, Uploading, Enabled, Warning, Error.
+
+### POST /detect
+
+**Input**: Multipart form — `file` (image bytes), optional `config` (JSON string).
+**Response**: `list[DetectionDto]`
+```json
+[
+  {
+    "centerX": 0.5,
+    "centerY": 0.5,
+    "width": 0.1,
+    "height": 0.1,
+    "classNum": 0,
+    "label": "ArmorVehicle",
+    "confidence": 0.85
+  }
+]
+```
+**Errors**: 400 (empty image / invalid data), 422 (runtime error), 503 (engine unavailable).
+
+### POST /detect/{media_id}
+
+**Input**: Path param `media_id`, optional JSON body `AIConfigDto`, headers `Authorization: Bearer {token}`, `x-refresh-token: {token}`.
+**Response**: `{"status": "started", "mediaId": "..."}` (202-style).
+**Errors**: 409 (duplicate detection for same media_id).
+**Side effects**: Starts async detection task; results delivered via SSE stream and/or posted to Annotations service.
+
+### GET /detect/stream
+
+**Response**: `text/event-stream` (SSE).
+```
+data: {"annotations": [...], "mediaId": "...", "mediaStatus": "AIProcessing", "mediaPercent": 50}
+```
+`mediaStatus` values: AIProcessing, AIProcessed, Error.
+
+## Data Access Patterns
+
+- In-memory state:
+  - `_active_detections: dict[str, bool]` — guards against duplicate media processing
+  - `_event_queues: list[asyncio.Queue]` — SSE client queues (maxsize=100)
+- No database access
+
+## Implementation Details
+
+- `Inference` is lazy-loaded on first use via `get_inference()` global function
+- `ThreadPoolExecutor(max_workers=2)` runs inference off the async event loop
+- SSE: one `asyncio.Queue` per connected client; events broadcast to all queues; full queues silently drop events
+- `TokenManager` decodes JWT exp from base64 payload (no signature verification), auto-refreshes 60s before expiry
+- `detection_to_dto` maps Detection fields to DetectionDto, looks up label from `constants_inf.annotations_dict`
+- Annotations posted to external service with base64-encoded frame image
+
+## Caveats
+
+- No CORS middleware configured
+- No rate limiting
+- No request body size limits beyond FastAPI defaults
+- `_active_detections` is an in-memory dict — not persistent across restarts, not distributed
+- SSE queue overflow silently drops events (QueueFull caught and ignored)
+- JWT token handling has no signature verification — relies entirely on the Annotations service for auth
+- No graceful shutdown handling for in-progress detections
+
+## Dependency Graph
+
+```mermaid
+graph TD
+    main --> inference
+    main --> constants_inf
+    main --> loader_http_client
+```
+
+## Logging Strategy
+
+No explicit logging in main.py — errors are caught and returned as HTTP responses. Logging happens in downstream components.
@@ -0,0 +1,157 @@
+# Azaion.Detections — Data Model
+
+## Entity-Relationship Diagram
+
+```mermaid
+erDiagram
+    AnnotationClass {
+        int id PK
+        string name
+        string color
+        int max_object_size_meters
+    }
+
+    Detection {
+        double x
+        double y
+        double w
+        double h
+        int cls FK
+        double confidence
+        string annotation_name
+    }
+
+    Annotation {
+        string name PK
+        string original_media_name
+        long time
+        bytes image
+    }
+
+    AIRecognitionConfig {
+        int frame_period_recognition
+        double frame_recognition_seconds
+        double probability_threshold
+        double tracking_distance_confidence
+        double tracking_probability_increase
+        double tracking_intersection_threshold
+        int big_image_tile_overlap_percent
+        int model_batch_size
+        double altitude
+        double focal_length
+        double sensor_width
+    }
+
+    AIAvailabilityStatus {
+        int status
+        string error_message
+    }
+
+    DetectionDto {
+        double centerX
+        double centerY
+        double width
+        double height
+        int classNum
+        string label
+        double confidence
+    }
+
+    DetectionEvent {
+        string mediaId
+        string mediaStatus
+        int mediaPercent
+    }
+
+    Annotation ||--o{ Detection : contains
+    Detection }o--|| AnnotationClass : "classified as"
+    DetectionEvent ||--o{ DetectionDto : annotations
+```
+
+## Core Domain Entities
+
+### AnnotationClass
+
+Loaded from `classes.json` at startup. 19 base classes × 3 weather modes = up to 57 entries in `annotations_dict`.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| id | int | Unique class ID (0-18 base, +20 for winter, +40 for night) |
+| name | str | Display name (e.g. "ArmorVehicle", "Truck(Wint)") |
+| color | str | Hex color for visualization |
+| max_object_size_meters | int | Maximum physical size — detections exceeding this are filtered out |
+
+### Detection
+
+Normalized bounding box (0..1 coordinate space).
+
+| Field | Type | Description |
+|-------|------|-------------|
+| x, y | double | Center coordinates (normalized) |
+| w, h | double | Width and height (normalized) |
+| cls | int | Class ID → maps to AnnotationClass |
+| confidence | double | Model confidence score (0..1) |
+| annotation_name | str | Back-reference to parent Annotation name |
+
+### Annotation
+
+Groups detections for a single frame or image tile.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| name | str | Unique name encoding media + tile/time info |
+| original_media_name | str | Source media filename (no extension, no spaces) |
+| time | long | Timestamp in ms (video) or 0 (image) |
+| detections | list[Detection] | Detected objects in this frame |
+| image | bytes | JPEG-encoded frame (set after validation) |
+
+### AIRecognitionConfig
+
+Runtime configuration for inference behavior. Created from dict (API) or msgpack (internal).
+
+### AIAvailabilityStatus
+
+Thread-safe engine lifecycle state. Values: NONE(0), DOWNLOADING(10), CONVERTING(20), UPLOADING(30), ENABLED(200), WARNING(300), ERROR(500).
+
+## API DTOs (Pydantic)
+
+### DetectionDto
+
+Outward-facing detection result. Maps from internal Detection + AnnotationClass label lookup.
+
+### DetectionEvent
+
+SSE event payload. Status values: AIProcessing, AIProcessed, Error.
+
+### AIConfigDto
+
+API input configuration. Same fields as AIRecognitionConfig with defaults.
+
+### HealthResponse
+
+Health check response with AI availability status string.
+
+## Annotation Naming Convention
+
+Annotation names encode media source and processing context:
+
+- **Image**: `{media_name}_000000`
+- **Image tile**: `{media_name}!split!{tile_size}_{x}_{y}!_000000`
+- **Video frame**: `{media_name}_{H}{MM}{SS}{f}` (compact time format)
+
+## Serialization Formats
+
+| Entity | Format | Usage |
+|--------|--------|-------|
+| Detection/Annotation | msgpack (compact keys) | `annotation.serialize()` |
+| AIRecognitionConfig | msgpack (compact keys) | `from_msgpack()` |
+| AIAvailabilityStatus | msgpack | `serialize()` |
+| DetectionDto/Event | JSON (Pydantic) | HTTP API responses, SSE |
+
+## No Persistent Storage
+
+This service has no database. All data is transient:
+- `classes.json` loaded at startup (read-only)
+- Model bytes downloaded from Loader on demand
+- Detection results returned via HTTP/SSE and posted to Annotations service
+- No local caching of results
@@ -0,0 +1,63 @@
+# Component Relationship Diagram
+
+```mermaid
+graph TD
+    subgraph "04 - API Layer"
+        API["main.py<br/>(FastAPI endpoints, DTOs, SSE, TokenManager)"]
+    end
+
+    subgraph "03 - Inference Pipeline"
+        INF["inference<br/>(orchestrator, preprocessing, postprocessing)"]
+        LDR["loader_http_client<br/>(model download/upload)"]
+    end
+
+    subgraph "02 - Inference Engines"
+        IE["inference_engine<br/>(abstract base)"]
+        ONNX["onnx_engine<br/>(ONNX Runtime)"]
+        TRT["tensorrt_engine<br/>(TensorRT + conversion)"]
+    end
+
+    subgraph "01 - Domain"
+        CONST["constants_inf<br/>(constants, logging, class registry)"]
+        ANNOT["annotation<br/>(Detection, Annotation)"]
+        AICFG["ai_config<br/>(AIRecognitionConfig)"]
+        STATUS["ai_availability_status<br/>(AIAvailabilityStatus)"]
+    end
+
+    subgraph "External Services"
+        LOADER["Loader Service<br/>(http://loader:8080)"]
+        ANNSVC["Annotations Service<br/>(http://annotations:8080)"]
+    end
+
+    API --> INF
+    API --> CONST
+    API --> LDR
+    API --> ANNSVC
+
+    INF --> ONNX
+    INF --> TRT
+    INF --> LDR
+    INF --> CONST
+    INF --> ANNOT
+    INF --> AICFG
+    INF --> STATUS
+
+    ONNX --> IE
+    ONNX --> CONST
+    TRT --> IE
+    TRT --> CONST
+
+    STATUS --> CONST
+    ANNOT --> CONST
+
+    LDR --> LOADER
+```
+
+## Component Summary
+
+| # | Component | Modules | Purpose |
+|---|-----------|---------|---------|
+| 01 | Domain | constants_inf, ai_config, ai_availability_status, annotation | Shared data models, enums, constants, logging |
+| 02 | Inference Engines | inference_engine, onnx_engine, tensorrt_engine | Pluggable ML inference backends |
+| 03 | Inference Pipeline | inference, loader_http_client | Orchestration: engine lifecycle, preprocessing, postprocessing, media processing |
+| 04 | API | main | HTTP API, SSE streaming, auth token management |
@@ -0,0 +1,125 @@
+# E2E Test Environment
+
+## Overview
+
+**System under test**: Azaion.Detections — FastAPI HTTP service exposing `POST /detect`, `POST /detect/{media_id}`, `GET /detect/stream`, `GET /health`
+**Consumer app purpose**: Standalone test runner that exercises the detection service through its public HTTP/SSE interfaces, validating end-to-end use cases without access to internals.
+
+## Docker Environment
+
+### Services
+
+| Service | Image / Build | Purpose | Ports |
+|---------|--------------|---------|-------|
+| detections | Build from repo root (setup.py + Cython compile, uvicorn entrypoint) | System under test — the detection microservice | 8000:8000 |
+| mock-loader | Custom lightweight HTTP stub (Python/Node) | Mock of the Loader service — serves ONNX model files, accepts TensorRT uploads | 8080:8080 |
+| mock-annotations | Custom lightweight HTTP stub (Python/Node) | Mock of the Annotations service — accepts detection results, provides token refresh | 8081:8081 |
+| e2e-consumer | Build from `e2e/` directory | Black-box test runner (pytest) | — |
+
+### GPU Configuration
+
+For tests requiring TensorRT (GPU path):
+- Deploy `detections` with `runtime: nvidia` and `NVIDIA_VISIBLE_DEVICES=all`
+- The test suite has two profiles: `gpu` (TensorRT tests) and `cpu` (ONNX fallback tests)
+- CPU-only tests run without GPU runtime, verifying ONNX fallback behavior
+
+### Networks
+
+| Network | Services | Purpose |
+|---------|----------|---------|
+| e2e-net | all | Isolated test network — all service-to-service communication via hostnames |
+
+### Volumes
+
+| Volume | Mounted to | Purpose |
+|--------|-----------|---------|
+| test-models | mock-loader:/models | Pre-built ONNX model file for test inference |
+| test-media | e2e-consumer:/media | Sample images and video files for detection requests |
+| test-classes | detections:/app/classes.json | classes.json with 19 detection classes |
+| test-results | e2e-consumer:/results | CSV test report output |
+
+### docker-compose structure
+
+```yaml
+services:
+  mock-loader:
+    build: ./e2e/mocks/loader
+    ports: ["8080:8080"]
+    volumes:
+      - test-models:/models
+    networks: [e2e-net]
+
+  mock-annotations:
+    build: ./e2e/mocks/annotations
+    ports: ["8081:8081"]
+    networks: [e2e-net]
+
+  detections:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    ports: ["8000:8000"]
+    environment:
+      - LOADER_URL=http://mock-loader:8080
+      - ANNOTATIONS_URL=http://mock-annotations:8081
+    volumes:
+      - test-classes:/app/classes.json
+    depends_on:
+      - mock-loader
+      - mock-annotations
+    networks: [e2e-net]
+    # GPU profile adds: runtime: nvidia
+
+  e2e-consumer:
+    build: ./e2e
+    volumes:
+      - test-media:/media
+      - test-results:/results
+    depends_on:
+      - detections
+    networks: [e2e-net]
+    command: pytest --csv=/results/report.csv
+
+volumes:
+  test-models:
+  test-media:
+  test-classes:
+  test-results:
+
+networks:
+  e2e-net:
+```
+
+## Consumer Application
+
+**Tech stack**: Python 3, pytest, requests, sseclient-py
+**Entry point**: `pytest --csv=/results/report.csv`
+
+### Communication with system under test
+
+| Interface | Protocol | Endpoint | Authentication |
+|-----------|----------|----------|----------------|
+| Health check | HTTP GET | `http://detections:8000/health` | None |
+| Single image detect | HTTP POST (multipart) | `http://detections:8000/detect` | None |
+| Media detect | HTTP POST (JSON) | `http://detections:8000/detect/{media_id}` | Bearer JWT + x-refresh-token headers |
+| SSE stream | HTTP GET (SSE) | `http://detections:8000/detect/stream` | None |
+
+### What the consumer does NOT have access to
+
+- No direct import of Cython modules (inference, annotation, engines)
+- No direct access to the detections service filesystem or Logs/ directory
+- No shared memory with the detections process
+- No direct calls to mock-loader or mock-annotations (except for test setup/teardown verification)
+
+## CI/CD Integration
+
+**When to run**: On PR merge to dev, nightly scheduled run
+**Pipeline stage**: After unit tests, before deployment
+**Gate behavior**: Block merge if any functional test fails; non-functional failures are warnings
+**Timeout**: 15 minutes for CPU profile, 30 minutes for GPU profile
+
+## Reporting
+
+**Format**: CSV
+**Columns**: Test ID, Test Name, Execution Time (ms), Result (PASS/FAIL/SKIP), Error Message (if FAIL)
+**Output path**: `/results/report.csv` (mounted volume → `./e2e-results/report.csv` on host)
@@ -0,0 +1,591 @@
+# E2E Functional Tests
+
+## Positive Scenarios
+
+### FT-P-01: Health check returns status before engine initialization
+
+**Summary**: Verify the health endpoint responds correctly when the inference engine has not yet been initialized.
+**Traces to**: AC-API-1, AC-EL-1
+**Category**: API, Engine Lifecycle
+
+**Preconditions**:
+- Detections service is running
+- No detection requests have been made (engine is not initialized)
+
+**Input data**: None
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | `GET /health` | 200 OK with `{"status": "healthy", "aiAvailability": "None"}` |
+
+**Expected outcome**: Health endpoint returns `status: "healthy"` and `aiAvailability: "None"` (engine not yet loaded).
+**Max execution time**: 2s
+
+---
+
+### FT-P-02: Health check reflects engine availability after initialization
+
+**Summary**: Verify the health endpoint reports the correct engine state after the engine has been initialized by a detection request.
+**Traces to**: AC-API-1, AC-EL-2
+**Category**: API, Engine Lifecycle
+
+**Preconditions**:
+- Detections service is running
+- Mock-loader serves the ONNX model file
+- At least one successful detection has been performed (engine initialized)
+
+**Input data**: small-image
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | `POST /detect` with small-image (trigger engine init) | 200 OK with detection results |
+| 2 | `GET /health` | 200 OK with `aiAvailability` set to `"Enabled"` or `"Warning"` |
+
+**Expected outcome**: `aiAvailability` reflects an initialized engine state (not `"None"` or `"Downloading"`).
+**Max execution time**: 30s (includes engine init on first call)
+
+---
+
+### FT-P-03: Single image detection returns detections
+
+**Summary**: Verify that a valid small image submitted via POST /detect returns structured detection results.
+**Traces to**: AC-DA-1, AC-API-2
+**Category**: Detection Accuracy, API
+
+**Preconditions**:
+- Engine is initialized (or will be on this call)
+- Mock-loader serves the model
+
+**Input data**: small-image (640×480, contains detectable objects)
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | `POST /detect` with small-image as multipart file | 200 OK |
+| 2 | Parse response JSON | Array of detection objects, each with `x`, `y`, `width`, `height`, `label`, `confidence` |
+| 3 | Verify all confidence values | Every detection has `confidence >= 0.25` (default probability_threshold) |
+
+**Expected outcome**: Non-empty array of DetectionDto objects. All confidences meet threshold. Each detection has valid bounding box coordinates (0.0–1.0 range).
+**Max execution time**: 30s
+
+---
+
+### FT-P-04: Large image triggers GSD-based tiling
+
+**Summary**: Verify that an image exceeding 1.5× model dimensions is tiled and processed with tile-level detection results merged.
+**Traces to**: AC-IP-1, AC-IP-2
+**Category**: Image Processing
+
+**Preconditions**:
+- Engine is initialized
+- Config includes altitude, focal_length, sensor_width for GSD calculation
+
+**Input data**: large-image (4000×3000)
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | `POST /detect` with large-image and config `{"altitude": 400, "focal_length": 24, "sensor_width": 23.5}` | 200 OK |
+| 2 | Parse response JSON | Array of detections |
+| 3 | Verify detection coordinates | Bounding box coordinates are in 0.0–1.0 range relative to the full original image |
+
+**Expected outcome**: Detections returned for the full image. Coordinates are normalized to original image dimensions (not tile dimensions). Processing time is longer than small-image due to tiling.
+**Max execution time**: 60s
+
+---
+
+### FT-P-05: Detection confidence filtering respects threshold
+
+**Summary**: Verify that detections below the configured probability_threshold are filtered out.
+**Traces to**: AC-DA-1
+**Category**: Detection Accuracy
+
+**Preconditions**:
+- Engine is initialized
+
+**Input data**: small-image
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | `POST /detect` with small-image and config `{"probability_threshold": 0.8}` | 200 OK |
+| 2 | Parse response JSON | All returned detections have `confidence >= 0.8` |
+| 3 | `POST /detect` with same image and config `{"probability_threshold": 0.1}` | 200 OK |
+| 4 | Compare result counts | Step 3 returns >= number of detections from Step 1 |
+
+**Expected outcome**: Higher threshold produces fewer or equal detections. No detection below threshold appears in results.
+**Max execution time**: 30s
+
+---
+
+### FT-P-06: Overlapping detections are deduplicated
+
+**Summary**: Verify that overlapping detections with containment ratio above threshold are deduplicated, keeping the higher-confidence one.
+**Traces to**: AC-DA-2
+**Category**: Detection Accuracy
+
+**Preconditions**:
+- Engine is initialized
+- Image produces overlapping detections (dense scene)
+
+**Input data**: small-image (scene with clustered objects)
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | `POST /detect` with small-image and config `{"tracking_intersection_threshold": 0.6}` | 200 OK |
+| 2 | Collect detections | No two detections of the same class overlap by more than 60% containment ratio |
+| 3 | `POST /detect` with same image and config `{"tracking_intersection_threshold": 0.01}` | 200 OK |
+| 4 | Compare result counts | Step 3 returns fewer or equal detections (more aggressive dedup) |
+
+**Expected outcome**: No pair of returned detections exceeds the configured overlap threshold.
+**Max execution time**: 30s
+
+---
+
+### FT-P-07: Physical size filtering removes oversized detections
+
+**Summary**: Verify that detections exceeding the MaxSizeM for their class (given GSD) are removed.
+**Traces to**: AC-DA-4
+**Category**: Detection Accuracy
+
+**Preconditions**:
+- Engine is initialized
+- classes.json loaded with MaxSizeM values
+
+**Input data**: small-image, config with known GSD parameters
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | `POST /detect` with small-image and config `{"altitude": 400, "focal_length": 24, "sensor_width": 23.5}` | 200 OK |
+| 2 | For each detection, compute physical size from bounding box + GSD | No detection's physical size exceeds the MaxSizeM defined for its class in classes.json |
+
+**Expected outcome**: All returned detections have plausible physical dimensions for their class.
+**Max execution time**: 30s
+
+---
+
+### FT-P-08: Async media detection returns "started" immediately
+
+**Summary**: Verify that POST /detect/{media_id} returns immediately with status "started" while processing continues in background.
+**Traces to**: AC-API-3
+**Category**: API
+
+**Preconditions**:
+- Engine is initialized
+- Media file paths are available via config
+
+**Input data**: jwt-token, test-video path in config
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | `POST /detect/test-media-001` with config paths and auth headers | 200 OK, `{"status": "started"}` |
+| 2 | Measure response time | Response arrives within 1s (before video processing completes) |
+
+**Expected outcome**: Immediate response with `{"status": "started"}`. Processing continues asynchronously.
+**Max execution time**: 2s (response only; processing continues in background)
+
+---
+
+### FT-P-09: SSE streaming delivers detection events during async processing
+
+**Summary**: Verify that SSE clients receive real-time detection events during async media detection.
+**Traces to**: AC-API-4, AC-API-3
+**Category**: API
+
+**Preconditions**:
+- Engine is initialized
+- SSE client connected before triggering detection
+
+**Input data**: jwt-token, test-video path in config
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | Open SSE connection: `GET /detect/stream` | Connection established |
+| 2 | `POST /detect/test-media-002` with config and auth headers | `{"status": "started"}` |
+| 3 | Listen on SSE connection | Receive events with `mediaStatus: "AIProcessing"` as frames are processed |
+| 4 | Wait for completion | Final event with `mediaStatus: "AIProcessed"` and `percent: 100` |
+
+**Expected outcome**: Multiple SSE events received. Events include detection data. Final event signals completion.
+**Max execution time**: 120s
+
+---
+
+### FT-P-10: Video frame sampling processes every Nth frame
+
+**Summary**: Verify that video processing respects the `frame_period_recognition` setting.
+**Traces to**: AC-VP-1
+**Category**: Video Processing
+
+**Preconditions**:
+- Engine is initialized
+- SSE client connected
+
+**Input data**: test-video (10s, 30fps = 300 frames), config `{"frame_period_recognition": 4}`
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | Open SSE connection | Connection established |
+| 2 | `POST /detect/test-media-003` with config `{"frame_period_recognition": 4, "paths": ["/media/test-video.mp4"]}` | `{"status": "started"}` |
+| 3 | Count distinct SSE events with detection data | Number of processed frames ≈ 300/4 = 75 (±10% tolerance for start/end frames) |
+
+**Expected outcome**: Approximately 75 frames processed (not all 300). The count scales proportionally with frame_period_recognition.
+**Max execution time**: 120s
+
+---
+
+### FT-P-11: Video annotation interval enforcement
+
+**Summary**: Verify that annotations are not reported more frequently than `frame_recognition_seconds`.
+**Traces to**: AC-VP-2
+**Category**: Video Processing
+
+**Preconditions**:
+- Engine is initialized
+- SSE client connected
+
+**Input data**: test-video, config `{"frame_recognition_seconds": 2}`
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | Open SSE connection | Connection established |
+| 2 | `POST /detect/test-media-004` with config `{"frame_recognition_seconds": 2, "paths": ["/media/test-video.mp4"]}` | `{"status": "started"}` |
+| 3 | Record timestamps of consecutive SSE detection events | Minimum gap between consecutive annotation events ≥ 2 seconds |
+
+**Expected outcome**: No two annotation events are closer than 2 seconds apart.
+**Max execution time**: 120s
+
+---
+
+### FT-P-12: Video tracking accepts new annotations on movement
+
+**Summary**: Verify that new annotations are accepted when detections move beyond the tracking threshold.
+**Traces to**: AC-VP-3
+**Category**: Video Processing
+
+**Preconditions**:
+- Engine is initialized
+- SSE client connected
+- Video contains moving objects
+
+**Input data**: test-video, config with `tracking_distance_confidence > 0`
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | Open SSE connection | Connection established |
+| 2 | `POST /detect/test-media-005` with config `{"tracking_distance_confidence": 0.05, "paths": ["/media/test-video.mp4"]}` | `{"status": "started"}` |
+| 3 | Collect SSE events | Annotations are emitted when object positions change between frames |
+
+**Expected outcome**: Annotations contain updated positions reflecting object movement. Static objects do not generate redundant annotations.
+**Max execution time**: 120s
+
+---
+
+### FT-P-13: Weather mode class variants
+
+**Summary**: Verify that the system supports detection across different weather mode class variants (Norm, Wint, Night).
+**Traces to**: AC-OC-1
+**Category**: Object Classes
+
+**Preconditions**:
+- Engine is initialized
+- classes.json includes weather-mode variants
+
+**Input data**: small-image
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | `POST /detect` with small-image | 200 OK |
+| 2 | Inspect returned detection labels | Labels correspond to valid class names from classes.json (base or weather-variant) |
+
+**Expected outcome**: All returned labels are valid entries from the 19-class × 3-mode registry.
+**Max execution time**: 30s
+
+---
+
+### FT-P-14: Engine lazy initialization on first detection request
+
+**Summary**: Verify that the engine is not initialized at startup but is initialized on the first detection request.
+**Traces to**: AC-EL-1, AC-EL-2
+**Category**: Engine Lifecycle
+
+**Preconditions**:
+- Fresh service start, no prior requests
+
+**Input data**: small-image
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | `GET /health` immediately after service starts | `aiAvailability: "None"` — engine not loaded |
+| 2 | `POST /detect` with small-image | 200 OK (may take longer — engine initializing) |
+| 3 | `GET /health` | `aiAvailability` changed to `"Enabled"` or status indicating engine is active |
+
+**Expected outcome**: Engine transitions from "None" to an active state only after a detection request.
+**Max execution time**: 60s
+
+---
+
+### FT-P-15: ONNX fallback when GPU unavailable
+
+**Summary**: Verify that the system falls back to ONNX Runtime when no compatible GPU is available.
+**Traces to**: AC-EL-2, RESTRICT-HW-1
+**Category**: Engine Lifecycle
+
+**Preconditions**:
+- Detections service running WITHOUT GPU runtime (CPU-only Docker profile)
+- Mock-loader serves ONNX model
+
+**Input data**: small-image
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | `POST /detect` with small-image | 200 OK with detection results |
+| 2 | `GET /health` | `aiAvailability` indicates engine is active (ONNX fallback) |
+
+**Expected outcome**: Detection succeeds via ONNX Runtime. No TensorRT-related errors.
+**Max execution time**: 60s
+
+---
+
+### FT-P-16: Tile deduplication removes duplicate detections at tile boundaries
+
+**Summary**: Verify that detections appearing in overlapping tile regions are deduplicated.
+**Traces to**: AC-DA-3
+**Category**: Detection Accuracy
+
+**Preconditions**:
+- Engine is initialized
+- Large image that triggers tiling
+
+**Input data**: large-image with config including GSD parameters and `big_image_tile_overlap_percent: 20`
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | `POST /detect` with large-image and tiling config | 200 OK |
+| 2 | Inspect detections near tile boundaries | No two detections of the same class are within 0.01 coordinate difference of each other (TILE_DUPLICATE_CONFIDENCE_THRESHOLD) |
+
+**Expected outcome**: Tile boundary detections are merged. No duplicates with near-identical coordinates remain.
+**Max execution time**: 60s
+
+---
+
+## Negative Scenarios
+
+### FT-N-01: Empty image returns 400
+
+**Summary**: Verify that submitting an empty file to POST /detect returns a 400 error.
+**Traces to**: AC-API-2 (negative case)
+**Category**: API
+
+**Preconditions**:
+- Detections service is running
+
+**Input data**: empty-image (zero-byte file)
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | `POST /detect` with empty-image as multipart file | 400 Bad Request |
+
+**Expected outcome**: HTTP 400 with error message indicating empty or invalid image.
+**Max execution time**: 5s
+
+---
+
+### FT-N-02: Invalid image data returns 400
+
+**Summary**: Verify that submitting a corrupt/non-image file returns a 400 error.
+**Traces to**: AC-API-2 (negative case)
+**Category**: API
+
+**Preconditions**:
+- Detections service is running
+
+**Input data**: corrupt-image (random binary data)
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | `POST /detect` with corrupt-image as multipart file | 400 Bad Request |
+
+**Expected outcome**: HTTP 400. Image decoding fails gracefully with an error response (not a 500).
+**Max execution time**: 5s
+
+---
+
+### FT-N-03: Detection when engine unavailable returns 503
+
+**Summary**: Verify that a detection request returns 503 when the engine cannot be initialized.
+**Traces to**: AC-API-2 (negative case), AC-EL-2
+**Category**: API, Engine Lifecycle
+
+**Preconditions**:
+- Mock-loader configured to return errors (model download fails)
+- Engine has not been previously initialized
+
+**Input data**: small-image
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | Configure mock-loader to return 503 on model requests | — |
+| 2 | `POST /detect` with small-image | 503 Service Unavailable or 422 |
+
+**Expected outcome**: HTTP 503 or 422 error indicating engine is not available. No crash or unhandled exception.
+**Max execution time**: 30s
+
+---
+
+### FT-N-04: Duplicate media_id returns 409
+
+**Summary**: Verify that submitting a second async detection request with an already-active media_id returns 409.
+**Traces to**: AC-API-3 (negative case)
+**Category**: API
+
+**Preconditions**:
+- Engine is initialized
+- An async detection is already in progress for media_id "dup-test"
+
+**Input data**: jwt-token, test-video
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | `POST /detect/dup-test` with config and auth headers | `{"status": "started"}` |
+| 2 | Immediately `POST /detect/dup-test` again (same media_id) | 409 Conflict |
+
+**Expected outcome**: Second request is rejected with 409. First detection continues normally.
+**Max execution time**: 5s
+
+---
+
+### FT-N-05: Missing classes.json prevents startup
+
+**Summary**: Verify that the service fails or returns no detections when classes.json is not present.
+**Traces to**: RESTRICT-SW-4
+**Category**: Restrictions
+
+**Preconditions**:
+- Detections service started WITHOUT classes.json volume mount
+
+**Input data**: None
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | Attempt to start detections service without classes.json | Service fails to start OR starts with empty class registry |
+| 2 | If started: `POST /detect` with small-image | Empty detections or error response |
+
+**Expected outcome**: Service either fails to start or returns no detections. No unhandled crash.
+**Max execution time**: 30s
+
+---
+
+### FT-N-06: Loader service unreachable during model download
+
+**Summary**: Verify that the system handles Loader service being unreachable during engine initialization.
+**Traces to**: RESTRICT-ENV-1, AC-EL-2
+**Category**: Resilience, Engine Lifecycle
+
+**Preconditions**:
+- Mock-loader is stopped or unreachable
+- Engine not yet initialized
+
+**Input data**: small-image
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | Stop mock-loader service | — |
+| 2 | `POST /detect` with small-image | Error response (503 or 422) |
+| 3 | `GET /health` | `aiAvailability` reflects error state |
+
+**Expected outcome**: Detection fails gracefully. Health endpoint reflects the engine error state.
+**Max execution time**: 30s
+
+---
+
+### FT-N-07: Annotations service unreachable — detection continues
+
+**Summary**: Verify that async detection continues even when the Annotations service is unreachable.
+**Traces to**: RESTRICT-ENV-2
+**Category**: Resilience
+
+**Preconditions**:
+- Engine is initialized
+- Mock-annotations is stopped or returns errors
+- SSE client connected
+
+**Input data**: jwt-token, test-video
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | Stop mock-annotations service | — |
+| 2 | `POST /detect/test-media-006` with config and auth | `{"status": "started"}` |
+| 3 | Listen on SSE | Detection events still arrive (annotations POST failure is silently caught) |
+| 4 | Wait for completion | Final `AIProcessed` event received |
+
+**Expected outcome**: Detection processing completes. SSE events are delivered. Annotations POST failure does not stop the detection pipeline.
+**Max execution time**: 120s
+
+---
+
+### FT-N-08: SSE queue overflow is silently dropped
+
+**Summary**: Verify that when an SSE client's queue reaches 100 events, additional events are dropped without error.
+**Traces to**: AC-API-4
+**Category**: API
+
+**Preconditions**:
+- Engine is initialized
+- SSE client connected but NOT consuming events (stalled reader)
+
+**Input data**: test-video (generates many events)
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | Open SSE connection but pause reading | Connection established |
+| 2 | `POST /detect/test-media-007` with config that generates > 100 events | `{"status": "started"}` |
+| 3 | Wait for processing to complete | No error on the detection side |
+| 4 | Resume reading SSE | Receive ≤ 100 events (queue max depth) |
+
+**Expected outcome**: No crash or error. Overflow events are silently dropped. Detection completes normally.
+**Max execution time**: 120s
@@ -0,0 +1,325 @@
+# E2E Non-Functional Tests
+
+## Performance Tests
+
+### NFT-PERF-01: Single image detection latency
+
+**Summary**: Measure end-to-end latency for a single small image detection request after engine is warm.
+**Traces to**: AC-API-2
+**Metric**: Request-to-response latency (ms)
+
+**Preconditions**:
+- Engine is initialized and warm (at least 1 prior detection)
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | Send 10 sequential `POST /detect` with small-image | Record each request-response latency |
+| 2 | Compute p50, p95, p99 | — |
+
+**Pass criteria**: p95 latency < 5000ms for ONNX CPU, p95 < 1000ms for TensorRT GPU
+**Duration**: ~60s (10 requests)
+
+---
+
+### NFT-PERF-02: Concurrent inference throughput
+
+**Summary**: Verify the system handles 2 concurrent inference requests (ThreadPoolExecutor limit).
+**Traces to**: RESTRICT-HW-3
+**Metric**: Throughput (requests/second), latency under concurrency
+
+**Preconditions**:
+- Engine is initialized and warm
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | Send 2 concurrent `POST /detect` requests with small-image | Measure both response times |
+| 2 | Send 3 concurrent requests | Third request should queue behind the first two |
+| 3 | Record total time for 3 concurrent requests vs 2 concurrent | — |
+
+**Pass criteria**: 2 concurrent requests complete without error. 3 concurrent requests: total time > time for 2 (queuing observed).
+**Duration**: ~30s
+
+---
+
+### NFT-PERF-03: Large image tiling processing time
+
+**Summary**: Measure processing time for a large image that triggers GSD-based tiling.
+**Traces to**: AC-IP-2
+**Metric**: Total processing time (ms), tiles processed
+
+**Preconditions**:
+- Engine is initialized and warm
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | `POST /detect` with large-image (4000×3000) and GSD config | Record total response time |
+| 2 | Compare with small-image baseline from NFT-PERF-01 | Ratio indicates tiling overhead |
+
+**Pass criteria**: Request completes within 120s. Processing time scales proportionally with number of tiles (not exponentially).
+**Duration**: ~120s
+
+---
+
+### NFT-PERF-04: Video processing frame rate
+
+**Summary**: Measure effective frame processing rate during video detection.
+**Traces to**: AC-VP-1
+**Metric**: Frames processed per second, total processing time
+
+**Preconditions**:
+- Engine is initialized and warm
+- SSE client connected
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | `POST /detect/test-media-perf` with test-video and `frame_period_recognition: 4` | — |
+| 2 | Count SSE events and measure total time from "started" to "AIProcessed" | Compute frames/second |
+
+**Pass criteria**: Processing completes within 5× video duration (10s video → < 50s processing). Frame processing rate is consistent (no stalls > 10s between events).
+**Duration**: ~120s
+
+---
+
+## Resilience Tests
+
+### NFT-RES-01: Loader service outage after engine initialization
+
+**Summary**: Verify that detections continue working when the Loader service goes down after the engine is already loaded.
+**Traces to**: RESTRICT-ENV-1
+
+**Preconditions**:
+- Engine is initialized (model already downloaded)
+
+**Fault injection**:
+- Stop mock-loader service
+
+**Steps**:
+
+| Step | Action | Expected Behavior |
+|------|--------|------------------|
+| 1 | Stop mock-loader | — |
+| 2 | `POST /detect` with small-image | 200 OK — detection succeeds (engine already in memory) |
+| 3 | `GET /health` | `aiAvailability` remains "Enabled" |
+
+**Pass criteria**: Detection continues to work. Health status remains stable. No errors from loader unavailability.
+
+---
+
+### NFT-RES-02: Annotations service outage during async detection
+
+**Summary**: Verify that async detection completes and delivers SSE events even when Annotations service is down.
+**Traces to**: RESTRICT-ENV-2
+
+**Preconditions**:
+- Engine is initialized
+- SSE client connected
+
+**Fault injection**:
+- Stop mock-annotations mid-processing
+
+**Steps**:
+
+| Step | Action | Expected Behavior |
+|------|--------|------------------|
+| 1 | Start async detection: `POST /detect/test-media-res01` | `{"status": "started"}` |
+| 2 | After first few SSE events, stop mock-annotations | — |
+| 3 | Continue listening to SSE | Events continue arriving. Annotations POST failures are silently caught |
+| 4 | Wait for completion | Final `AIProcessed` event received |
+
+**Pass criteria**: Detection pipeline completes fully. SSE delivery is unaffected. No crash or 500 errors.
+
+---
+
+### NFT-RES-03: Engine initialization retry after transient loader failure
+
+**Summary**: Verify that if model download fails on first attempt, a subsequent detection request retries initialization.
+**Traces to**: AC-EL-2
+
+**Preconditions**:
+- Fresh service (engine not initialized)
+
+**Fault injection**:
+- Mock-loader returns 503 on first model request, then recovers
+
+**Steps**:
+
+| Step | Action | Expected Behavior |
+|------|--------|------------------|
+| 1 | Configure mock-loader to fail first request | — |
+| 2 | `POST /detect` with small-image | Error (503 or 422) |
+| 3 | Configure mock-loader to succeed | — |
+| 4 | `POST /detect` with small-image | 200 OK — engine initializes on retry |
+
+**Pass criteria**: Second detection succeeds after loader recovers. System does not permanently lock into error state.
+
+---
+
+### NFT-RES-04: Service restart with in-memory state loss
+
+**Summary**: Verify that after a service restart, all in-memory state (_active_detections, _event_queues) is cleanly reset.
+**Traces to**: RESTRICT-OP-5, RESTRICT-OP-6
+
+**Preconditions**:
+- Previous detection may have been in progress
+
+**Fault injection**:
+- Restart detections container
+
+**Steps**:
+
+| Step | Action | Expected Behavior |
+|------|--------|------------------|
+| 1 | Restart detections container | — |
+| 2 | `GET /health` | Returns `aiAvailability: "None"` (fresh start) |
+| 3 | `POST /detect/any-media-id` | Accepted (no stale _active_detections blocking it) |
+
+**Pass criteria**: No stale state from previous session. All endpoints functional after restart.
+
+---
+
+## Security Tests
+
+### NFT-SEC-01: Malformed multipart payload handling
+
+**Summary**: Verify that the service handles malformed multipart requests without crashing.
+**Traces to**: AC-API-2 (security)
+
+**Steps**:
+
+| Step | Consumer Action | Expected Response |
+|------|----------------|------------------|
+| 1 | Send `POST /detect` with truncated multipart body (missing boundary) | 400 or 422 — not 500 |
+| 2 | Send `POST /detect` with Content-Type: multipart but no file part | 400 — empty image |
+| 3 | `GET /health` after malformed requests | Service is still healthy |
+
+**Pass criteria**: All malformed requests return 4xx. Service remains operational.
+
+---
+
+### NFT-SEC-02: Oversized request body
+
+**Summary**: Verify system behavior when an extremely large file is uploaded.
+**Traces to**: RESTRICT-OP-4
+
+**Steps**:
+
+| Step | Consumer Action | Expected Response |
+|------|----------------|------------------|
+| 1 | Send `POST /detect` with a 500 MB random file | Error response (413, 400, or timeout) — not OOM crash |
+| 2 | `GET /health` | Service is still running |
+
+**Pass criteria**: Service does not crash or run out of memory. Returns an error or times out gracefully.
+
+---
+
+### NFT-SEC-03: JWT token is forwarded without modification
+
+**Summary**: Verify that the Authorization header is forwarded to the Annotations service as-is.
+**Traces to**: AC-API-3
+
+**Steps**:
+
+| Step | Consumer Action | Expected Response |
+|------|----------------|------------------|
+| 1 | `POST /detect/test-media-sec` with `Authorization: Bearer test-jwt-123` and `x-refresh-token: refresh-456` | `{"status": "started"}` |
+| 2 | After processing, query mock-annotations `GET /mock/annotations` | Recorded request contains `Authorization: Bearer test-jwt-123` header |
+
+**Pass criteria**: Exact token received by mock-annotations matches what the consumer sent.
+
+---
+
+## Resource Limit Tests
+
+### NFT-RES-LIM-01: ThreadPoolExecutor worker limit (2 concurrent)
+
+**Summary**: Verify that no more than 2 inference operations run simultaneously.
+**Traces to**: RESTRICT-HW-3
+
+**Preconditions**:
+- Engine is initialized
+
+**Monitoring**:
+- Track concurrent request timings
+
+**Steps**:
+
+| Step | Consumer Action | Expected Behavior |
+|------|----------------|------------------|
+| 1 | Send 4 concurrent `POST /detect` requests | — |
+| 2 | Measure response arrival times | First 2 complete roughly together; next 2 complete after |
+
+**Duration**: ~60s
+**Pass criteria**: Clear evidence of 2-at-a-time processing (second batch starts after first completes). All 4 requests eventually succeed.
+
+---
+
+### NFT-RES-LIM-02: SSE queue depth limit (100 events)
+
+**Summary**: Verify that the SSE queue per client does not exceed 100 events.
+**Traces to**: AC-API-4
+
+**Preconditions**:
+- Engine is initialized
+
+**Monitoring**:
+- SSE event count
+
+**Steps**:
+
+| Step | Consumer Action | Expected Behavior |
+|------|----------------|------------------|
+| 1 | Open SSE connection but do not read (stall client) | — |
+| 2 | Trigger async detection that produces > 100 events | — |
+| 3 | After processing completes, drain the SSE queue | ≤ 100 events received |
+
+**Duration**: ~120s
+**Pass criteria**: No more than 100 events buffered. No OOM or connection errors from queue growth.
+
+---
+
+### NFT-RES-LIM-03: Max 300 detections per frame
+
+**Summary**: Verify that the system returns at most 300 detections per frame (model output limit).
+**Traces to**: RESTRICT-SW-6
+
+**Preconditions**:
+- Engine is initialized
+- Image with dense scene expected to produce many detections
+
+**Monitoring**:
+- Detection count per response
+
+**Duration**: ~30s
+**Pass criteria**: No response contains more than 300 detections. Dense images hit the cap without errors.
+
+---
+
+### NFT-RES-LIM-04: Log file rotation and retention
+
+**Summary**: Verify that log files rotate daily and are retained for 30 days.
+**Traces to**: AC-LOG-1, AC-LOG-2
+
+**Preconditions**:
+- Detections service running with Logs/ volume mounted for inspection
+
+**Monitoring**:
+- Log file creation, naming, and count
+
+**Steps**:
+
+| Step | Consumer Action | Expected Behavior |
+|------|----------------|------------------|
+| 1 | Make several detection requests | Logs written to `Logs/log_inference_YYYYMMDD.txt` |
+| 2 | Verify log file name matches current date | File name contains today's date |
+| 3 | Verify log content format | Contains INFO/DEBUG/WARNING entries with timestamps |
+
+**Duration**: ~10s
+**Pass criteria**: Log file exists with correct date-based naming. Content includes structured log entries.
@@ -0,0 +1,41 @@
+# E2E Test Data Management
+
+## Seed Data Sets
+
+| Data Set | Description | Used by Tests | How Loaded | Cleanup |
+|----------|-------------|---------------|-----------|---------|
+| onnx-model | Small YOLO ONNX model (valid architecture, 1280×1280 input, 19 classes) | All detection tests | Volume mount to mock-loader `/models/azaion.onnx` | Container restart |
+| classes-json | classes.json with 19 detection classes, 3 weather modes, MaxSizeM values | All tests | Volume mount to detections `/app/classes.json` | Container restart |
+| small-image | JPEG image 640×480 — below 1.5× model size (1920×1920 threshold) | FT-P-03, FT-P-05, FT-P-06, FT-P-07, FT-N-01, FT-N-02, NFT-PERF-01 | Volume mount to consumer `/media/` | N/A (read-only) |
+| large-image | JPEG image 4000×3000 — above 1.5× model size, triggers tiling | FT-P-04, FT-P-16, NFT-PERF-03 | Volume mount to consumer `/media/` | N/A (read-only) |
+| test-video | MP4 video, 10s duration, 30fps — contains objects across frames | FT-P-10, FT-P-11, FT-P-12, NFT-PERF-04 | Volume mount to consumer `/media/` | N/A (read-only) |
+| empty-image | Zero-byte file | FT-N-01 | Volume mount to consumer `/media/` | N/A (read-only) |
+| corrupt-image | Binary garbage (not valid image format) | FT-N-02 | Volume mount to consumer `/media/` | N/A (read-only) |
+| jwt-token | Valid JWT with exp claim (not signature-verified by detections) | FT-P-08, FT-P-09 | Generated by consumer at runtime | N/A |
+
+## Data Isolation Strategy
+
+Each test run starts with fresh containers (`docker compose down -v && docker compose up`). The detections service is stateless — no persistent data between runs. Mock services reset their state on container restart. Tests that modify mock behavior (e.g., making loader unreachable) must run in isolated test groups.
+
+## Input Data Mapping
+
+| Input Data File | Source Location | Description | Covers Scenarios |
+|-----------------|----------------|-------------|-----------------|
+| data_parameters.md | `_docs/00_problem/input_data/data_parameters.md` | API parameter schemas, config defaults, classes.json structure | Informs all test input construction |
+
+## External Dependency Mocks
+
+| External Service | Mock/Stub | How Provided | Behavior |
+|-----------------|-----------|-------------|----------|
+| Loader Service | HTTP stub | Docker service `mock-loader` | Serves ONNX model from volume on `GET /models/azaion.onnx`. Accepts TensorRT upload on `POST /upload`. Returns 404 for unknown files. Configurable: can simulate downtime (503) via control endpoint `POST /mock/config`. |
+| Annotations Service | HTTP stub | Docker service `mock-annotations` | Accepts annotation POST on `POST /annotations` — stores in memory for verification. Provides token refresh on `POST /auth/refresh`. Configurable: can simulate downtime (503) via control endpoint `POST /mock/config`. Returns recorded annotations on `GET /mock/annotations` for test assertions. |
+
+## Data Validation Rules
+
+| Data Type | Validation | Invalid Examples | Expected System Behavior |
+|-----------|-----------|-----------------|------------------------|
+| Image file (POST /detect) | Non-empty bytes, decodable by OpenCV | Zero-byte file, random binary, text file | 400 Bad Request |
+| media_id (POST /detect/{media_id}) | String, unique among active detections | Already-active media_id | 409 Conflict |
+| AIConfigDto fields | probability_threshold: 0.0–1.0; frame_period_recognition: positive int; big_image_tile_overlap_percent: 0–100 | probability_threshold: -1 or 2.0; frame_period_recognition: 0 | System uses defaults or returns validation error |
+| Authorization header | Bearer token format | Missing header, malformed JWT | Token forwarded to Annotations as-is; detections still proceeds |
+| classes.json | JSON array of objects with Id, Name, Color, MaxSizeM | Missing file, empty array, malformed JSON | Service fails to start / returns empty detections |
@@ -0,0 +1,70 @@
+# E2E Traceability Matrix
+
+## Acceptance Criteria Coverage
+
+| AC ID | Acceptance Criterion | Test IDs | Coverage |
+|-------|---------------------|----------|----------|
+| AC-DA-1 | Detections with confidence below probability_threshold are filtered out | FT-P-03, FT-P-05 | Covered |
+| AC-DA-2 | Overlapping detections with containment ratio > tracking_intersection_threshold are deduplicated | FT-P-06 | Covered |
+| AC-DA-3 | Tile duplicate detections identified when bounding box coordinates differ by < 0.01 | FT-P-16 | Covered |
+| AC-DA-4 | Physical size filtering: detections exceeding max_object_size_meters removed | FT-P-07 | Covered |
+| AC-VP-1 | Frame sampling: every Nth frame processed (frame_period_recognition) | FT-P-10, NFT-PERF-04 | Covered |
+| AC-VP-2 | Minimum annotation interval: frame_recognition_seconds between annotations | FT-P-11 | Covered |
+| AC-VP-3 | Tracking: new annotation accepted on movement/confidence change | FT-P-12 | Covered |
+| AC-IP-1 | Images ≤ 1.5× model dimensions processed as single frame | FT-P-03 | Covered |
+| AC-IP-2 | Larger images: tiled based on GSD, tile overlap configurable | FT-P-04, FT-P-16, NFT-PERF-03 | Covered |
+| AC-API-1 | GET /health returns status: "healthy" with aiAvailability | FT-P-01, FT-P-02 | Covered |
+| AC-API-2 | POST /detect returns detections synchronously. Errors: 400, 422, 503 | FT-P-03, FT-N-01, FT-N-02, FT-N-03, NFT-SEC-01 | Covered |
+| AC-API-3 | POST /detect/{media_id} returns immediately with "started". Rejects duplicate with 409 | FT-P-08, FT-N-04, NFT-SEC-03 | Covered |
+| AC-API-4 | GET /detect/stream delivers SSE events. Queue max depth: 100 | FT-P-09, FT-N-08, NFT-RES-LIM-02 | Covered |
+| AC-EL-1 | Engine initialization is lazy (first detection, not startup) | FT-P-01, FT-P-14 | Covered |
+| AC-EL-2 | Status transitions: NONE → DOWNLOADING → ENABLED / ERROR | FT-P-02, FT-P-14, FT-N-03, NFT-RES-03 | Covered |
+| AC-EL-3 | GPU check: NVIDIA GPU with compute capability ≥ 6.1 | FT-P-15 | Covered |
+| AC-EL-4 | TensorRT conversion uses FP16 when GPU supports it | — | NOT COVERED — requires specific GPU hardware; verified by visual inspection of TensorRT build logs |
+| AC-EL-5 | Background conversion does not block API responsiveness | FT-P-01, FT-P-14 | Covered |
+| AC-LOG-1 | Log files: Logs/log_inference_YYYYMMDD.txt | NFT-RES-LIM-04 | Covered |
+| AC-LOG-2 | Rotation: daily. Retention: 30 days | NFT-RES-LIM-04 | Covered |
+| AC-OC-1 | 19 base classes, 3 weather modes, up to 57 variants | FT-P-13 | Covered |
+| AC-OC-2 | Each class has Id, Name, Color, MaxSizeM | FT-P-07, FT-P-13 | Covered |
+
+## Restrictions Coverage
+
+| Restriction ID | Restriction | Test IDs | Coverage |
+|---------------|-------------|----------|----------|
+| RESTRICT-HW-1 | GPU CC ≥ 6.1 required for TensorRT | FT-P-15 | Covered |
+| RESTRICT-HW-2 | TensorRT conversion uses 90% GPU memory workspace | — | NOT COVERED — requires controlled GPU memory environment; verified during manual engine build |
+| RESTRICT-HW-3 | ThreadPoolExecutor limited to 2 workers | NFT-PERF-02, NFT-RES-LIM-01 | Covered |
+| RESTRICT-SW-1 | Python 3 + Cython 3.1.3 compilation required | — | NOT COVERED — build-time constraint; verified by Docker build succeeding |
+| RESTRICT-SW-2 | ONNX model (azaion.onnx) must be available via Loader | FT-N-06, NFT-RES-01, NFT-RES-03 | Covered |
+| RESTRICT-SW-3 | TensorRT engines are GPU-architecture-specific (not portable) | — | NOT COVERED — requires multiple GPU architectures; documented constraint |
+| RESTRICT-SW-4 | classes.json must exist at startup | FT-N-05 | Covered |
+| RESTRICT-SW-5 | Model input: fixed 1280×1280 | FT-P-03, FT-P-04 | Covered |
+| RESTRICT-SW-6 | Max 300 detections per frame | NFT-RES-LIM-03 | Covered |
+| RESTRICT-ENV-1 | LOADER_URL must be reachable for model download | FT-N-06, NFT-RES-01, NFT-RES-03 | Covered |
+| RESTRICT-ENV-2 | ANNOTATIONS_URL must be reachable for result posting | FT-N-07, NFT-RES-02 | Covered |
+| RESTRICT-ENV-3 | Logs/ directory must be writable | NFT-RES-LIM-04 | Covered |
+| RESTRICT-OP-1 | Stateless — no local persistence of detection results | NFT-RES-04 | Covered |
+| RESTRICT-OP-2 | No TLS at application level | — | NOT COVERED — infrastructure-level concern; out of scope for application E2E tests |
+| RESTRICT-OP-3 | No CORS configuration | — | NOT COVERED — requires browser-based testing; out of scope for API-level E2E |
+| RESTRICT-OP-4 | No rate limiting | NFT-SEC-02 | Covered |
+| RESTRICT-OP-5 | No graceful shutdown — in-progress detections not drained | NFT-RES-04 | Covered |
+| RESTRICT-OP-6 | Single-instance in-memory state (not shared across instances) | NFT-RES-04 | Covered |
+
+## Coverage Summary
+
+| Category | Total Items | Covered | Not Covered | Coverage % |
+|----------|-----------|---------|-------------|-----------|
+| Acceptance Criteria | 22 | 21 | 1 | 95% |
+| Restrictions | 18 | 13 | 5 | 72% |
+| **Total** | **40** | **34** | **6** | **85%** |
+
+## Uncovered Items Analysis
+
+| Item | Reason Not Covered | Risk | Mitigation |
+|------|-------------------|------|-----------|
+| AC-EL-4 (FP16 TensorRT) | Requires specific GPU with FP16 support; E2E test cannot control hardware capabilities | Low — TensorRT builder auto-detects FP16 | Verified during manual TensorRT build; logged by engine |
+| RESTRICT-HW-2 (90% GPU memory) | Requires controlled GPU memory environment with specific memory sizes | Low — hardcoded workspace fraction | Verified by observing TensorRT build logs on target hardware |
+| RESTRICT-SW-1 (Cython compilation) | Build-time constraint, not runtime behavior | Low — Docker build validates this | Docker build step serves as the validation gate |
+| RESTRICT-SW-3 (TensorRT non-portable) | Requires multiple GPU architectures in test environment | Low — engine filename encodes architecture | Architecture-specific filenames prevent incorrect loading |
+| RESTRICT-OP-2 (No TLS) | Infrastructure-level concern; application does not implement TLS | None — by design | TLS handled by reverse proxy / service mesh in deployment |
+| RESTRICT-OP-3 (No CORS) | Browser-specific concern; API-level E2E tests don't use browsers | Low — known limitation | Can be tested separately with browser automation if needed |
@@ -0,0 +1,68 @@
+# Module: ai_availability_status
+
+## Purpose
+
+Thread-safe status tracker for the AI engine lifecycle (downloading, converting, uploading, enabled, warning, error).
+
+## Public Interface
+
+### Enum: AIAvailabilityEnum
+
+| Value | Name | Meaning |
+|-------|------|---------|
+| 0 | NONE | Initial state, not yet initialized |
+| 10 | DOWNLOADING | Model download in progress |
+| 20 | CONVERTING | ONNX-to-TensorRT conversion in progress |
+| 30 | UPLOADING | Converted model upload in progress |
+| 200 | ENABLED | Engine ready for inference |
+| 300 | WARNING | Operational with warnings |
+| 500 | ERROR | Failed, not operational |
+
+### Class: AIAvailabilityStatus
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `status` | AIAvailabilityEnum | Current status |
+| `error_message` | str or None | Error/warning details |
+
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `__init__` | `()` | Sets status=NONE, error_message=None |
+| `__str__` | `() -> str` | Thread-safe formatted string: `"StatusText ErrorText"` |
+| `serialize` | `() -> bytes` | Thread-safe msgpack serialization `{s: status, m: error_message}` **(legacy — not called in current codebase)** |
+| `set_status` | `(AIAvailabilityEnum status, str error_message=None) -> void` | Thread-safe status update; logs via constants_inf (error or info) |
+
+## Internal Logic
+
+All public methods acquire a `threading.Lock` before reading/writing status fields. `set_status` logs the transition: errors go to `constants_inf.logerror`, normal transitions go to `constants_inf.log`.
+
+## Dependencies
+
+- **External**: `msgpack`, `threading`
+- **Internal**: `constants_inf` (logging)
+
+## Consumers
+
+- `inference` — creates instance, calls `set_status` during engine lifecycle, exposes `ai_availability_status` for health checks
+- `main` — reads `ai_availability_status` via inference for `/health` endpoint
+
+## Data Models
+
+- `AIAvailabilityEnum` — status enum
+- `AIAvailabilityStatus` — stateful status holder
+
+## Configuration
+
+None.
+
+## External Integrations
+
+None.
+
+## Security
+
+Thread-safe via Lock — safe for concurrent access from FastAPI async + ThreadPoolExecutor.
+
+## Tests
+
+None found.
@@ -0,0 +1,69 @@
+# Module: ai_config
+
+## Purpose
+
+Data class holding all AI recognition configuration parameters, with factory methods for deserialization from msgpack and dict formats.
+
+## Public Interface
+
+### Class: AIRecognitionConfig
+
+#### Fields
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `frame_period_recognition` | int | 4 | Process every Nth frame in video |
+| `frame_recognition_seconds` | double | 2.0 | Minimum seconds between valid video annotations |
+| `probability_threshold` | double | 0.25 | Minimum detection confidence |
+| `tracking_distance_confidence` | double | 0.0 | Distance threshold for tracking (model-width units) |
+| `tracking_probability_increase` | double | 0.0 | Required confidence increase for tracking update |
+| `tracking_intersection_threshold` | double | 0.6 | IoU threshold for overlapping detection removal |
+| `file_data` | bytes | `b''` | Raw file data (msgpack use) |
+| `paths` | list[str] | `[]` | Media file paths to process |
+| `model_batch_size` | int | 1 | Batch size for inference |
+| `big_image_tile_overlap_percent` | int | 20 | Tile overlap percentage for large image splitting |
+| `altitude` | double | 400 | Camera altitude in meters |
+| `focal_length` | double | 24 | Camera focal length in mm |
+| `sensor_width` | double | 23.5 | Camera sensor width in mm |
+
+#### Methods
+
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `from_msgpack` | `(bytes data) -> AIRecognitionConfig` | Static cdef; deserializes from msgpack binary |
+| `from_dict` | `(dict data) -> AIRecognitionConfig` | Static def; deserializes from Python dict |
+
+## Internal Logic
+
+Both factory methods apply defaults for missing keys. `from_msgpack` uses compact single-character keys (`f_pr`, `pt`, `t_dc`, etc.) while `from_dict` uses full descriptive keys.
+
+**Legacy/unused**: `from_msgpack()` is defined but never called in the current codebase — it is a remnant of a previous queue-based architecture. Only `from_dict()` is actively used. The `file_data` field is stored but never read anywhere.
+
+## Dependencies
+
+- **External**: `msgpack`
+- **Internal**: none (leaf module)
+
+## Consumers
+
+- `inference` — creates config from dict, uses all fields for frame selection, detection filtering, image tiling, and tracking
+
+## Data Models
+
+- `AIRecognitionConfig` — the sole data class
+
+## Configuration
+
+Camera/altitude parameters (`altitude`, `focal_length`, `sensor_width`) are used for ground sampling distance calculation in aerial image processing.
+
+## External Integrations
+
+None.
+
+## Security
+
+None.
+
+## Tests
+
+None found.
@@ -0,0 +1,83 @@
+# Module: annotation
+
+## Purpose
+
+Data models for object detections and annotations (grouped detections for a frame/tile with metadata).
+
+## Public Interface
+
+### Class: Detection
+
+Represents a single bounding box detection in normalized coordinates.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `x` | double | Center X (normalized 0..1) |
+| `y` | double | Center Y (normalized 0..1) |
+| `w` | double | Width (normalized 0..1) |
+| `h` | double | Height (normalized 0..1) |
+| `cls` | int | Class ID (maps to constants_inf.annotations_dict) |
+| `confidence` | double | Detection confidence (0..1) |
+| `annotation_name` | str | Parent annotation name (set after construction) |
+
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `__init__` | `(double x, y, w, h, int cls, double confidence)` | Constructor |
+| `__str__` | `() -> str` | Format: `"{cls}: {x} {y} {w} {h}, prob: {confidence}%"` |
+| `__eq__` | `(other) -> bool` | Two detections are equal if all bbox coordinates differ by less than `TILE_DUPLICATE_CONFIDENCE_THRESHOLD` |
+| `overlaps` | `(Detection det2, float confidence_threshold) -> bool` | Returns True if IoU-like overlap ratio (overlap area / min area) exceeds threshold |
+
+### Class: Annotation
+
+Groups detections for a single frame or image tile.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `name` | str | Unique annotation name (encodes tile/time info) |
+| `original_media_name` | str | Source media filename (without extension/spaces) |
+| `time` | long | Timestamp in milliseconds (video) or 0 (image) |
+| `detections` | list[Detection] | Detections found in this frame/tile |
+| `image` | bytes | JPEG-encoded frame image (set after validation) |
+
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `__init__` | `(str name, str original_media_name, long ms, list[Detection] detections)` | Sets annotation_name on all detections |
+| `__str__` | `() -> str` | Formatted detection summary |
+| `serialize` | `() -> bytes` | Msgpack serialization with compact keys **(legacy — not called in current codebase)** |
+
+## Internal Logic
+
+- `Detection.__eq__` uses `constants_inf.TILE_DUPLICATE_CONFIDENCE_THRESHOLD` (0.01) to determine if two detections at absolute coordinates are duplicates across adjacent tiles.
+- `Detection.overlaps` computes the overlap as `overlap_area / min(area1, area2)` — this is not standard IoU but a containment-biased metric.
+- `Annotation.__init__` sets `annotation_name` on every child detection.
+
+## Dependencies
+
+- **External**: `msgpack`
+- **Internal**: `constants_inf` (TILE_DUPLICATE_CONFIDENCE_THRESHOLD constant)
+
+## Consumers
+
+- `inference` — creates Detection and Annotation instances during postprocessing, uses overlaps for NMS, uses equality for tile dedup
+- `main` — reads Detection fields for DTO conversion
+
+## Data Models
+
+- `Detection` — bounding box + class + confidence
+- `Annotation` — frame/tile container for detections + metadata + image
+
+## Configuration
+
+None.
+
+## External Integrations
+
+None.
+
+## Security
+
+None.
+
+## Tests
+
+None found.
@@ -0,0 +1,95 @@
+# Module: constants_inf
+
+## Purpose
+
+Application-wide constants, logging infrastructure, and the object detection class registry loaded from `classes.json`.
+
+## Public Interface
+
+### Constants
+
+| Name | Type | Value | Description |
+|------|------|-------|-------------|
+| `CONFIG_FILE` | str | `"config.yaml"` | Configuration file path |
+| `QUEUE_CONFIG_FILENAME` | str | `"secured-config.json"` | Queue config filename |
+| `AI_ONNX_MODEL_FILE` | str | `"azaion.onnx"` | ONNX model filename |
+| `CDN_CONFIG` | str | `"cdn.yaml"` | CDN configuration file |
+| `MODELS_FOLDER` | str | `"models"` | Directory for model files |
+| `SMALL_SIZE_KB` | int | `3` | Small file size threshold (KB) |
+| `SPLIT_SUFFIX` | str | `"!split!"` | Delimiter in tiled image names |
+| `TILE_DUPLICATE_CONFIDENCE_THRESHOLD` | double | `0.01` | Threshold for tile duplicate detection equality |
+| `METERS_IN_TILE` | int | `25` | Physical tile size in meters for large image splitting |
+| `weather_switcher_increase` | int | `20` | Offset between weather mode class ID ranges |
+
+### Enum: WeatherMode
+
+| Value | Name | Meaning |
+|-------|------|---------|
+| 0 | Norm | Normal weather |
+| 20 | Wint | Winter |
+| 40 | Night | Night |
+
+### Class: AnnotationClass
+
+Fields: `id` (int), `name` (str), `color` (str), `max_object_size_meters` (int).
+
+Represents a detection class with its display metadata and physical size constraint.
+
+### Functions
+
+| Function | Signature | Description |
+|----------|-----------|-------------|
+| `log` | `(str log_message) -> void` | Info-level log via loguru |
+| `logerror` | `(str error) -> void` | Error-level log via loguru |
+| `format_time` | `(int ms) -> str` | Converts milliseconds to compact time string `HMMSSf` |
+
+### Global: `annotations_dict`
+
+`dict[int, AnnotationClass]` — loaded at module init from `classes.json`. Contains 19 base classes × 3 weather modes (Norm/Wint/Night) = up to 57 entries. Keys are class IDs, values are `AnnotationClass` instances.
+
+## Internal Logic
+
+- On import, reads `classes.json` and builds `annotations_dict` by iterating 3 weather mode offsets (0, 20, 40) and adding class ID offsets. Weather mode names are appended to class names for non-Norm modes.
+- Configures loguru with:
+  - File sink: `Logs/log_inference_YYYYMMDD.txt` (daily rotation, 30-day retention)
+  - Stdout: INFO/DEBUG/SUCCESS levels
+  - Stderr: WARNING and above
+
+## Legacy / Orphaned Declarations
+
+The `.pxd` header declares `QUEUE_MAXSIZE`, `COMMANDS_QUEUE`, and `ANNOTATIONS_QUEUE` (with comments referencing RabbitMQ) that are **not defined** in the `.pyx` implementation. These are remnants of a previous queue-based architecture and are unused.
+
+## Dependencies
+
+- **External**: `json`, `sys`, `loguru`
+- **Internal**: none (leaf module)
+
+## Consumers
+
+- `ai_availability_status` (logging)
+- `annotation` (tile duplicate threshold)
+- `onnx_engine` (logging)
+- `tensorrt_engine` (logging)
+- `inference` (logging, constants, annotations_dict, format_time, SPLIT_SUFFIX, METERS_IN_TILE, MODELS_FOLDER, AI_ONNX_MODEL_FILE)
+- `main` (annotations_dict for label lookup)
+
+## Data Models
+
+- `AnnotationClass` — detection class metadata
+- `WeatherMode` — enum for weather conditions
+
+## Configuration
+
+- Reads `classes.json` at import time (must exist in working directory)
+
+## External Integrations
+
+None.
+
+## Security
+
+None.
+
+## Tests
+
+None found.
@@ -0,0 +1,107 @@
+# Module: inference
+
+## Purpose
+
+Core inference orchestrator — manages the AI engine lifecycle, preprocesses media (images and video), runs batched inference, postprocesses detections, and applies validation filters (overlap removal, size filtering, tile deduplication, video tracking).
+
+## Public Interface
+
+### Class: Inference
+
+#### Fields
+
+| Field | Type | Access | Description |
+|-------|------|--------|-------------|
+| `loader_client` | object | internal | LoaderHttpClient instance |
+| `engine` | InferenceEngine | internal | Active engine (OnnxEngine or TensorRTEngine), None if unavailable |
+| `ai_availability_status` | AIAvailabilityStatus | public | Current AI readiness status |
+| `stop_signal` | bool | internal | Flag to abort video processing |
+| `model_width` | int | internal | Model input width in pixels |
+| `model_height` | int | internal | Model input height in pixels |
+| `detection_counts` | dict[str, int] | internal | Per-media detection count |
+| `is_building_engine` | bool | internal | True during async TensorRT conversion |
+
+#### Methods
+
+| Method | Signature | Access | Description |
+|--------|-----------|--------|-------------|
+| `__init__` | `(loader_client)` | public | Initializes state, calls `init_ai()` |
+| `run_detect` | `(dict config_dict, annotation_callback, status_callback=None)` | cpdef | Main entry: parses config, separates images/videos, processes each |
+| `detect_single_image` | `(bytes image_bytes, dict config_dict) -> list` | cpdef | Single-image detection from raw bytes, returns list[Detection] |
+| `stop` | `()` | cpdef | Sets stop_signal to True |
+| `init_ai` | `()` | cdef | Engine initialization: tries TensorRT engine file → falls back to ONNX → background TensorRT conversion |
+| `preprocess` | `(frames) -> ndarray` | cdef | OpenCV blobFromImage: resize, normalize to 0..1, swap RGB, stack batch |
+| `postprocess` | `(output, ai_config) -> list[list[Detection]]` | cdef | Parses engine output to Detection objects, applies confidence threshold and overlap removal |
+
+## Internal Logic
+
+### Engine Initialization (`init_ai`)
+
+1. If `_converted_model_bytes` exists → load TensorRT from those bytes
+2. If GPU available → try downloading pre-built TensorRT engine from loader
+3. If download fails → download ONNX model, start background thread for ONNX→TensorRT conversion
+4. If no GPU → load OnnxEngine from ONNX model bytes
+
+### Preprocessing
+
+- `cv2.dnn.blobFromImage`: scale 1/255, resize to model dims, BGR→RGB, no crop
+- Stack multiple frames via `np.vstack` for batched inference
+
+### Postprocessing
+
+- Engine output format: `[batch][detection_index][x1, y1, x2, y2, confidence, class_id]`
+- Coordinates normalized to 0..1 by dividing by model width/height
+- Converted to center-format (cx, cy, w, h) Detection objects
+- Filtered by `probability_threshold`
+- Overlapping detections removed via `remove_overlapping_detections` (greedy, keeps higher confidence)
+
+### Image Processing
+
+- Small images (≤1.5× model size): processed as single frame
+- Large images: split into tiles based on ground sampling distance. Tile size = `METERS_IN_TILE / GSD` pixels. Tiles overlap by configurable percentage.
+- Tile deduplication: absolute-coordinate comparison across adjacent tiles using `Detection.__eq__`
+- Size filtering: detections whose physical size (meters) exceeds `AnnotationClass.max_object_size_meters` are removed. Physical size computed from GSD × pixel dimensions.
+
+### Video Processing
+
+- Frame sampling: every Nth frame (`frame_period_recognition`)
+- Batch accumulation up to engine batch size
+- Annotation validity: must differ from previous annotation by either:
+  - Time gap ≥ `frame_recognition_seconds`
+  - More detections than previous
+  - Any detection moved beyond `tracking_distance_confidence` threshold
+  - Any detection confidence increased beyond `tracking_probability_increase`
+- Valid frames get JPEG-encoded image attached
+
+### Ground Sampling Distance (GSD)
+
+`GSD = sensor_width * altitude / (focal_length * image_width)` — meters per pixel, used for physical size filtering of aerial detections.
+
+## Dependencies
+
+- **External**: `cv2`, `numpy`, `pynvml`, `mimetypes`, `pathlib`, `threading`
+- **Internal**: `constants_inf`, `ai_availability_status`, `annotation`, `ai_config`, `tensorrt_engine` (conditional), `onnx_engine` (conditional), `inference_engine` (type)
+
+## Consumers
+
+- `main` — lazy-initializes Inference, calls `run_detect`, `detect_single_image`, reads `ai_availability_status`
+
+## Data Models
+
+Uses `Detection`, `Annotation` (from annotation), `AIRecognitionConfig` (from ai_config), `AIAvailabilityStatus` (from ai_availability_status).
+
+## Configuration
+
+All runtime config comes via `AIRecognitionConfig` dict. Engine selection is automatic based on GPU availability (checked at module-level via pynvml).
+
+## External Integrations
+
+- **Loader service** (via loader_client): model download/upload
+
+## Security
+
+None.
+
+## Tests
+
+None found.
@@ -0,0 +1,59 @@
+# Module: inference_engine
+
+## Purpose
+
+Abstract base class defining the interface that all inference engine implementations must follow.
+
+## Public Interface
+
+### Class: InferenceEngine
+
+#### Fields
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `batch_size` | int | Number of images per inference batch |
+
+#### Methods
+
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `__init__` | `(bytes model_bytes, int batch_size=1, **kwargs)` | Stores batch_size |
+| `get_input_shape` | `() -> tuple` | Returns (height, width) of model input. Abstract — raises `NotImplementedError` |
+| `get_batch_size` | `() -> int` | Returns `self.batch_size` |
+| `run` | `(input_data) -> list` | Runs inference on preprocessed input blob. Abstract — raises `NotImplementedError` |
+
+## Internal Logic
+
+Pure abstract class. All methods except `get_batch_size` raise `NotImplementedError` and must be overridden by subclasses (`OnnxEngine`, `TensorRTEngine`).
+
+## Dependencies
+
+- **External**: `numpy` (declared in .pxd, not used in base)
+- **Internal**: none (leaf module)
+
+## Consumers
+
+- `onnx_engine` — subclass
+- `tensorrt_engine` — subclass
+- `inference` — type reference in .pxd
+
+## Data Models
+
+None.
+
+## Configuration
+
+None.
+
+## External Integrations
+
+None.
+
+## Security
+
+None.
+
+## Tests
+
+None found.
@@ -0,0 +1,61 @@
+# Module: loader_http_client
+
+## Purpose
+
+HTTP client for downloading and uploading model files (and other binary resources) via an external Loader microservice.
+
+## Public Interface
+
+### Class: LoadResult
+
+Simple result wrapper.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `err` | str or None | Error message if operation failed |
+| `data` | bytes or None | Response payload on success |
+
+### Class: LoaderHttpClient
+
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `__init__` | `(str base_url)` | Stores base URL, strips trailing slash |
+| `load_big_small_resource` | `(str filename, str directory) -> LoadResult` | POST to `/load/{filename}` with JSON body `{filename, folder}`, returns raw bytes |
+| `upload_big_small_resource` | `(bytes content, str filename, str directory) -> LoadResult` | POST to `/upload/{filename}` with multipart file + form data `{folder}` |
+| `stop` | `() -> None` | No-op placeholder |
+
+## Internal Logic
+
+Both load/upload methods wrap all exceptions into `LoadResult(err=str(e))`. Errors are logged via loguru but never raised.
+
+## Dependencies
+
+- **External**: `requests`, `loguru`
+- **Internal**: none (leaf module)
+
+## Consumers
+
+- `inference` — downloads ONNX/TensorRT models, uploads converted TensorRT engines
+- `main` — instantiates client with `LOADER_URL`
+
+## Data Models
+
+- `LoadResult` — operation result with error-or-data semantics
+
+## Configuration
+
+- `base_url` — provided at construction time, sourced from `LOADER_URL` environment variable in `main.py`
+
+## External Integrations
+
+| Integration | Protocol | Endpoint Pattern |
+|-------------|----------|-----------------|
+| Loader service | HTTP POST | `/load/{filename}` (download), `/upload/{filename}` (upload) |
+
+## Security
+
+None (no auth headers sent to loader).
+
+## Tests
+
+None found.
@@ -0,0 +1,115 @@
+# Module: main
+
+## Purpose
+
+FastAPI application entry point — exposes HTTP API for object detection on images and video media, health checks, and Server-Sent Events (SSE) streaming of detection results.
+
+## Public Interface
+
+### API Endpoints
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/health` | Returns AI engine availability status |
+| POST | `/detect` | Single image detection (multipart file upload) |
+| POST | `/detect/{media_id}` | Start async detection on media from loader service |
+| GET | `/detect/stream` | SSE stream of detection events |
+
+### DTOs (Pydantic Models)
+
+| Model | Fields | Description |
+|-------|--------|-------------|
+| `DetectionDto` | centerX, centerY, width, height, classNum, label, confidence | Single detection result |
+| `DetectionEvent` | annotations (list[DetectionDto]), mediaId, mediaStatus, mediaPercent | SSE event payload |
+| `HealthResponse` | status, aiAvailability, errorMessage | Health check response |
+| `AIConfigDto` | frame_period_recognition, frame_recognition_seconds, probability_threshold, tracking_*, model_batch_size, big_image_tile_overlap_percent, altitude, focal_length, sensor_width, paths | Configuration input for media detection |
+
+### Class: TokenManager
+
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `__init__` | `(str access_token, str refresh_token)` | Stores tokens |
+| `get_valid_token` | `() -> str` | Returns access_token; auto-refreshes if expiring within 60s |
+
+## Internal Logic
+
+### `/health`
+
+Returns `HealthResponse` with `status="healthy"` always. `aiAvailability` reflects the engine's `AIAvailabilityStatus`. On exception, returns `aiAvailability="None"`.
+
+### `/detect` (single image)
+
+1. Reads uploaded file bytes
+2. Parses optional JSON config
+3. Runs `inference.detect_single_image` in ThreadPoolExecutor (max 2 workers)
+4. Returns list of DetectionDto
+
+Error mapping: RuntimeError("not available") → 503, RuntimeError → 422, ValueError → 400.
+
+### `/detect/{media_id}` (async media)
+
+1. Checks for duplicate active detection (409 if already running)
+2. Extracts auth tokens from Authorization header and x-refresh-token header
+3. Creates `asyncio.Task` for background detection
+4. Detection runs `inference.run_detect` in ThreadPoolExecutor
+5. Callbacks push `DetectionEvent` to all SSE queues
+6. If auth token present, also POSTs annotations to the Annotations service
+7. Returns immediately: `{"status": "started", "mediaId": media_id}`
+
+### `/detect/stream` (SSE)
+
+- Creates asyncio.Queue per client (maxsize=100)
+- Yields `data: {json}\n\n` SSE format
+- Cleans up queue on disconnect
+
+### Token Management
+
+- Decodes JWT exp claim from base64 payload (no signature verification)
+- Auto-refreshes via POST to `{ANNOTATIONS_URL}/auth/refresh` when within 60s of expiry
+
+### Annotations Service Integration
+
+- POST to `{ANNOTATIONS_URL}/annotations` with:
+  - `mediaId`, `source: 0`, `videoTime` (formatted from ms), `detections` (list of dto dicts)
+  - Optional base64-encoded `image`
+  - Bearer token in Authorization header
+
+## Dependencies
+
+- **External**: `asyncio`, `base64`, `json`, `os`, `time`, `concurrent.futures`, `typing`, `requests`, `fastapi`, `pydantic`
+- **Internal**: `inference` (lazy import), `constants_inf` (label lookup), `loader_http_client` (client instantiation)
+
+## Consumers
+
+None (entry point).
+
+## Data Models
+
+- `DetectionDto`, `DetectionEvent`, `HealthResponse`, `AIConfigDto` — Pydantic models for API
+- `TokenManager` — JWT token lifecycle
+
+## Configuration
+
+| Env Var | Default | Description |
+|---------|---------|-------------|
+| `LOADER_URL` | `http://loader:8080` | Loader service base URL |
+| `ANNOTATIONS_URL` | `http://annotations:8080` | Annotations service base URL |
+
+## External Integrations
+
+| Service | Protocol | Purpose |
+|---------|----------|---------|
+| Loader | HTTP (via LoaderHttpClient) | Model loading |
+| Annotations | HTTP POST | Auth refresh (`/auth/refresh`), annotation posting (`/annotations`) |
+
+## Security
+
+- Bearer token from request headers, refreshed via Annotations service
+- JWT exp decoded (base64, no signature verification) — token validation is not performed locally
+- No CORS configuration
+- No rate limiting
+- No input validation on media_id path parameter beyond string type
+
+## Tests
+
+None found.
@@ -0,0 +1,51 @@
+# Module: onnx_engine
+
+## Purpose
+
+ONNX Runtime-based inference engine — CPU/CUDA fallback when TensorRT is unavailable.
+
+## Public Interface
+
+### Class: OnnxEngine (extends InferenceEngine)
+
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `__init__` | `(bytes model_bytes, int batch_size=1, **kwargs)` | Loads ONNX model from bytes, creates InferenceSession with CUDA > CPU provider priority. Reads input shape and batch size from model metadata. |
+| `get_input_shape` | `() -> tuple` | Returns `(height, width)` from input tensor shape |
+| `get_batch_size` | `() -> int` | Returns batch size (from model if not dynamic, else from constructor) |
+| `run` | `(input_data) -> list` | Runs session inference, returns output tensors |
+
+## Internal Logic
+
+- Provider order: `["CUDAExecutionProvider", "CPUExecutionProvider"]` — ONNX Runtime selects the best available.
+- If the model's batch dimension is dynamic (-1), uses the constructor's `batch_size` parameter.
+- Logs model input metadata and custom metadata map at init.
+
+## Dependencies
+
+- **External**: `onnxruntime`
+- **Internal**: `inference_engine` (base class), `constants_inf` (logging)
+
+## Consumers
+
+- `inference` — instantiated when no compatible NVIDIA GPU is found
+
+## Data Models
+
+None (wraps onnxruntime.InferenceSession).
+
+## Configuration
+
+None.
+
+## External Integrations
+
+None directly — model bytes are provided by caller (loaded via `loader_http_client`).
+
+## Security
+
+None.
+
+## Tests
+
+None found.
@@ -0,0 +1,57 @@
+# Module: tensorrt_engine
+
+## Purpose
+
+TensorRT-based inference engine — high-performance GPU inference with CUDA memory management and ONNX-to-TensorRT model conversion.
+
+## Public Interface
+
+### Class: TensorRTEngine (extends InferenceEngine)
+
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `__init__` | `(bytes model_bytes, int batch_size=4, **kwargs)` | Deserializes TensorRT engine from bytes, allocates CUDA input/output memory, creates execution context and stream |
+| `get_input_shape` | `() -> tuple` | Returns `(height, width)` from input tensor shape |
+| `get_batch_size` | `() -> int` | Returns batch size |
+| `run` | `(input_data) -> list` | Async H2D copy → execute → D2H copy, returns output as numpy array |
+| `get_gpu_memory_bytes` | `(int device_id) -> int` | Static. Returns total GPU memory in bytes (default 2GB if unavailable) |
+| `get_engine_filename` | `(int device_id) -> str` | Static. Returns engine filename with compute capability and SM count: `azaion.cc_{major}.{minor}_sm_{count}.engine` |
+| `convert_from_onnx` | `(bytes onnx_model) -> bytes or None` | Static. Converts ONNX model to TensorRT serialized engine. Uses 90% of GPU memory as workspace. Enables FP16 if supported. |
+
+## Internal Logic
+
+- Input shape defaults to 1280×1280 for dynamic dimensions.
+- Output shape defaults to 300 max detections × 6 values (x1, y1, x2, y2, conf, cls) for dynamic dimensions.
+- `run` uses async CUDA memory transfers with stream synchronization.
+- `convert_from_onnx` uses explicit batch mode, configures FP16 precision when GPU supports it.
+- Default batch size is 4 (vs OnnxEngine's 1).
+
+## Dependencies
+
+- **External**: `tensorrt`, `pycuda.driver`, `pycuda.autoinit`, `pynvml`, `numpy`
+- **Internal**: `inference_engine` (base class), `constants_inf` (logging)
+
+## Consumers
+
+- `inference` — instantiated when compatible NVIDIA GPU is found; also calls `convert_from_onnx` and `get_engine_filename`
+
+## Data Models
+
+None (wraps TensorRT runtime objects).
+
+## Configuration
+
+- Engine filename is GPU-specific (compute capability + SM count).
+- Workspace memory is 90% of available GPU memory.
+
+## External Integrations
+
+None directly — model bytes provided by caller.
+
+## Security
+
+None.
+
+## Tests
+
+None found.
@@ -0,0 +1,13 @@
+{
+  "current_step": "complete",
+  "completed_steps": ["discovery", "module-analysis", "component-assembly", "system-synthesis", "verification", "solution-extraction", "problem-extraction", "final-report"],
+  "modules_total": 10,
+  "modules_documented": [
+    "constants_inf", "ai_config", "inference_engine", "loader_http_client",
+    "ai_availability_status", "annotation", "onnx_engine", "tensorrt_engine",
+    "inference", "main"
+  ],
+  "modules_remaining": [],
+  "components_written": ["01_domain", "02_inference_engines", "03_inference_pipeline", "04_api"],
+  "last_updated": "2026-03-21"
+}
@@ -0,0 +1,259 @@
+# Azaion.Detections — System Flows
+
+## Flow Inventory
+
+| # | Flow Name | Trigger | Primary Components | Criticality |
+|---|-----------|---------|-------------------|-------------|
+| F1 | Health Check | Client GET /health | API, Inference Pipeline | High |
+| F2 | Single Image Detection | Client POST /detect | API, Inference Pipeline, Engines, Domain | High |
+| F3 | Media Detection (Async) | Client POST /detect/{media_id} | API, Inference Pipeline, Engines, Domain, Loader, Annotations | High |
+| F4 | SSE Event Streaming | Client GET /detect/stream | API | Medium |
+| F5 | Engine Initialization | First detection request | Inference Pipeline, Engines, Loader | High |
+| F6 | TensorRT Background Conversion | No pre-built TensorRT engine | Inference Pipeline, Engines, Loader | Medium |
+
+## Flow Dependencies
+
+| Flow | Depends On | Shares Data With |
+|------|-----------|-----------------|
+| F1 | F5 (for meaningful status) | — |
+| F2 | F5 (engine must be ready) | — |
+| F3 | F5 (engine must be ready) | F4 (via SSE event queues) |
+| F4 | — | F3 (receives events) |
+| F5 | — | F6 (triggers conversion if needed) |
+| F6 | F5 (triggered by init failure) | F5 (provides converted bytes) |
+
+---
+
+## Flow F1: Health Check
+
+### Description
+
+Client queries the service health status. Returns the current AI engine availability (None, Downloading, Converting, Enabled, Error, etc.) without triggering engine initialization.
+
+### Sequence Diagram
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant API as main.py
+    participant INF as Inference
+    participant STATUS as AIAvailabilityStatus
+
+    Client->>API: GET /health
+    API->>INF: get_inference()
+    INF-->>API: Inference instance
+    API->>STATUS: str(ai_availability_status)
+    STATUS-->>API: "Enabled" / "Downloading" / etc.
+    API-->>Client: HealthResponse{status, aiAvailability, errorMessage}
+```
+
+### Error Scenarios
+
+| Error | Where | Detection | Recovery |
+|-------|-------|-----------|----------|
+| Inference not yet created | get_inference() | Exception caught | Returns aiAvailability="None" |
+
+---
+
+## Flow F2: Single Image Detection
+
+### Description
+
+Client uploads an image file and optionally provides config. The service runs inference synchronously (via ThreadPoolExecutor) and returns detection results.
+
+### Sequence Diagram
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant API as main.py
+    participant INF as Inference
+    participant ENG as Engine (ONNX/TRT)
+    participant CONST as constants_inf
+
+    Client->>API: POST /detect (file + config?)
+    API->>API: Read image bytes, parse config
+    API->>INF: detect_single_image(bytes, config_dict)
+    INF->>INF: init_ai() (idempotent)
+    INF->>INF: cv2.imdecode → preprocess
+    INF->>ENG: run(input_blob)
+    ENG-->>INF: raw output
+    INF->>INF: postprocess → filter by threshold → remove overlaps
+    INF-->>API: list[Detection]
+    API->>CONST: annotations_dict[cls].name (label lookup)
+    API-->>Client: list[DetectionDto]
+```
+
+### Error Scenarios
+
+| Error | Where | Detection | Recovery |
+|-------|-------|-----------|----------|
+| Empty image | API | len(bytes)==0 | 400 Bad Request |
+| Invalid image data | imdecode | frame is None | 400 ValueError |
+| Engine not available | init_ai | engine is None | 503 Service Unavailable |
+| Inference failure | run/postprocess | RuntimeError | 422 Unprocessable Entity |
+
+---
+
+## Flow F3: Media Detection (Async)
+
+### Description
+
+Client triggers detection on media files (images/video) available via the Loader service. Processing runs asynchronously. Results are streamed via SSE (F4) and optionally posted to the Annotations service.
+
+### Sequence Diagram
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant API as main.py
+    participant INF as Inference
+    participant ENG as Engine
+    participant LDR as Loader Service
+    participant ANN as Annotations Service
+    participant SSE as SSE Queues
+
+    Client->>API: POST /detect/{media_id} (config + auth headers)
+    API->>API: Check _active_detections (duplicate guard)
+    API-->>Client: {"status": "started"}
+
+    Note over API: asyncio.Task created
+
+    API->>INF: run_detect(config, on_annotation, on_status)
+    loop For each media file
+        INF->>INF: Read/decode media (cv2)
+        INF->>INF: Preprocess (tile/batch)
+        INF->>ENG: run(input_blob)
+        ENG-->>INF: raw output
+        INF->>INF: Postprocess + validate
+
+        opt Valid annotation found
+            INF->>API: on_annotation(annotation, percent)
+            API->>SSE: DetectionEvent → all queues
+            opt Auth token present
+                API->>ANN: POST /annotations (detections + image)
+            end
+        end
+    end
+
+    INF->>API: on_status(media_name, count)
+    API->>SSE: DetectionEvent(status=AIProcessed, percent=100)
+```
+
+### Data Flow
+
+| Step | From | To | Data | Format |
+|------|------|----|------|--------|
+| 1 | Client | API | media_id, config, auth tokens | HTTP POST JSON + headers |
+| 2 | API | Inference | config_dict, callbacks | Python dict + callables |
+| 3 | Inference | Engine | preprocessed batch | numpy ndarray |
+| 4 | Engine | Inference | raw detections | numpy ndarray |
+| 5 | Inference | API (callback) | Annotation + percent | Python objects |
+| 6 | API | SSE clients | DetectionEvent | SSE JSON stream |
+| 7 | API | Annotations Service | detections + base64 image | HTTP POST JSON |
+
+### Error Scenarios
+
+| Error | Where | Detection | Recovery |
+|-------|-------|-----------|----------|
+| Duplicate media_id | API | _active_detections check | 409 Conflict |
+| Engine unavailable | run_detect | engine is None | Error event pushed to SSE |
+| Inference failure | processing | Exception | Error event pushed to SSE, media_id cleared |
+| Annotations POST failure | _post_annotation | Exception | Silently caught, detection continues |
+| SSE queue full | event broadcast | QueueFull | Event silently dropped for that client |
+
+---
+
+## Flow F4: SSE Event Streaming
+
+### Description
+
+Client opens a persistent SSE connection. Receives real-time detection events from all active F3 media detection tasks.
+
+### Sequence Diagram
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant API as main.py
+    participant Queue as asyncio.Queue
+
+    Client->>API: GET /detect/stream
+    API->>Queue: Create queue (maxsize=100)
+    API->>API: Add to _event_queues
+
+    loop Until disconnect
+        Queue-->>API: await event
+        API-->>Client: data: {DetectionEvent JSON}
+    end
+
+    Note over API: Client disconnects (CancelledError)
+    API->>API: Remove from _event_queues
+```
+
+---
+
+## Flow F5: Engine Initialization
+
+### Description
+
+On first detection request, the Inference class initializes the ML engine. Strategy: try TensorRT pre-built engine → fall back to ONNX → background TensorRT conversion.
+
+### Flowchart
+
+```mermaid
+flowchart TD
+    Start([init_ai called]) --> CheckEngine{engine exists?}
+    CheckEngine -->|Yes| Done([Return])
+    CheckEngine -->|No| CheckBuilding{is_building_engine?}
+    CheckBuilding -->|Yes| Done
+    CheckBuilding -->|No| CheckConverted{_converted_model_bytes?}
+    CheckConverted -->|Yes| LoadConverted[Load TensorRT from bytes]
+    LoadConverted --> SetEnabled[status = ENABLED]
+    SetEnabled --> Done
+
+    CheckConverted -->|No| CheckGPU{GPU available?}
+    CheckGPU -->|Yes| DownloadTRT[Download pre-built TensorRT engine]
+    DownloadTRT --> TRTSuccess{Success?}
+    TRTSuccess -->|Yes| LoadTRT[Create TensorRTEngine]
+    LoadTRT --> SetEnabled
+    TRTSuccess -->|No| DownloadONNX[Download ONNX model]
+    DownloadONNX --> StartConversion[Start background thread: convert ONNX→TRT]
+    StartConversion --> Done
+
+    CheckGPU -->|No| DownloadONNX2[Download ONNX model]
+    DownloadONNX2 --> LoadONNX[Create OnnxEngine]
+    LoadONNX --> Done
+```
+
+---
+
+## Flow F6: TensorRT Background Conversion
+
+### Description
+
+When no pre-built TensorRT engine exists, a background daemon thread converts the ONNX model to TensorRT, uploads the result to Loader for caching, and stores the bytes for the next `init_ai` call.
+
+### Sequence Diagram
+
+```mermaid
+sequenceDiagram
+    participant INF as Inference
+    participant TRT as TensorRTEngine
+    participant LDR as Loader Service
+    participant STATUS as AIAvailabilityStatus
+
+    Note over INF: Background thread starts
+    INF->>STATUS: set_status(CONVERTING)
+    INF->>TRT: convert_from_onnx(onnx_bytes)
+    TRT->>TRT: Build TensorRT engine (90% GPU memory workspace)
+    TRT-->>INF: engine_bytes
+
+    INF->>STATUS: set_status(UPLOADING)
+    INF->>LDR: upload_big_small_resource(engine_bytes, filename)
+    LDR-->>INF: LoadResult
+
+    INF->>INF: _converted_model_bytes = engine_bytes
+    INF->>STATUS: set_status(ENABLED)
+    Note over INF: Next init_ai() call will load from _converted_model_bytes
+```
@@ -0,0 +1,175 @@
+# Test Infrastructure
+
+**Task**: AZ-138_test_infrastructure
+**Name**: Test Infrastructure
+**Description**: Scaffold the E2E test project — test runner, mock services, Docker test environment, test data fixtures, reporting
+**Complexity**: 5 points
+**Dependencies**: None
+**Component**: Integration Tests
+**Jira**: AZ-138
+**Epic**: AZ-137
+
+## Test Project Folder Layout
+
+```
+e2e/
+├── conftest.py
+├── requirements.txt
+├── Dockerfile
+├── pytest.ini
+├── mocks/
+│   ├── loader/
+│   │   ├── Dockerfile
+│   │   └── app.py
+│   └── annotations/
+│       ├── Dockerfile
+│       └── app.py
+├── fixtures/
+│   ├── small_image.jpg          (640×480 JPEG with detectable objects)
+│   ├── large_image.jpg          (4000×3000 JPEG for tiling tests)
+│   ├── test_video.mp4           (10s, 30fps MP4 with moving objects)
+│   ├── empty_image              (zero-byte file)
+│   ├── corrupt_image            (random binary garbage)
+│   ├── classes.json             (19 classes, 3 weather modes, MaxSizeM values)
+│   └── azaion.onnx              (small valid YOLO ONNX model, 1280×1280 input, 19 classes)
+├── tests/
+│   ├── test_health_engine.py
+│   ├── test_single_image.py
+│   ├── test_tiling.py
+│   ├── test_async_sse.py
+│   ├── test_video.py
+│   ├── test_negative.py
+│   ├── test_resilience.py
+│   ├── test_performance.py
+│   ├── test_security.py
+│   └── test_resource_limits.py
+└── docker-compose.test.yml
+```
+
+### Layout Rationale
+
+- `mocks/` separated from tests — each mock is a standalone Docker service with its own Dockerfile
+- `fixtures/` holds all static test data, volume-mounted into containers
+- `tests/` organized by test category matching the test spec structure (one file per task group)
+- `conftest.py` provides shared pytest fixtures (HTTP clients, SSE helpers, service readiness checks)
+- `pytest.ini` configures markers for `gpu`/`cpu` profiles and test ordering
+
+## Mock Services
+
+| Mock Service | Replaces | Endpoints | Behavior |
+|-------------|----------|-----------|----------|
+| mock-loader | Loader service (model download/upload) | `GET /models/azaion.onnx` — serves ONNX model from volume. `POST /upload` — accepts TensorRT engine upload, stores in memory. `POST /mock/config` — control API (simulate 503, reset state). `GET /mock/status` — returns mock state. | Deterministic: serves model file from `/models/` volume. Configurable downtime via control endpoint. First-request-fail mode for retry tests. |
+| mock-annotations | Annotations service (result posting, token refresh) | `POST /annotations` — accepts annotation POST, stores in memory. `POST /auth/refresh` — returns refreshed token. `POST /mock/config` — control API (simulate 503, reset state). `GET /mock/annotations` — returns recorded annotations for assertion. | Records all incoming annotations in memory. Provides token refresh. Configurable downtime. Assertions via GET endpoint to verify what was received. |
+
+### Mock Control API
+
+Both mock services expose:
+- `POST /mock/config` — accepts JSON `{"mode": "normal"|"error"|"first_fail"}` to control behavior
+- `POST /mock/reset` — clears recorded state (annotations, uploads)
+- `GET /mock/status` — returns current mode and recorded interaction count
+
+## Docker Test Environment
+
+### docker-compose.test.yml Structure
+
+| Service | Image / Build | Purpose | Depends On |
+|---------|--------------|---------|------------|
+| detections | Build from repo root (Dockerfile) | System under test — FastAPI detection service | mock-loader, mock-annotations |
+| mock-loader | Build from `e2e/mocks/loader/` | Serves ONNX model, accepts TensorRT uploads | — |
+| mock-annotations | Build from `e2e/mocks/annotations/` | Accepts annotation results, provides token refresh | — |
+| e2e-consumer | Build from `e2e/` | pytest test runner | detections |
+
+### Networks and Volumes
+
+**Network**: `e2e-net` — isolated bridge network, all services communicate via hostnames
+
+**Volumes**:
+
+| Volume | Mount Target | Content |
+|--------|-------------|---------|
+| test-models | mock-loader:/models | `azaion.onnx` model file |
+| test-media | e2e-consumer:/media | Test images and video files |
+| test-classes | detections:/app/classes.json | `classes.json` with 19 detection classes |
+| test-results | e2e-consumer:/results | CSV test report output |
+
+### GPU Profile
+
+Two Docker Compose profiles:
+- **cpu** (default): `detections` runs without GPU runtime, exercises ONNX fallback path
+- **gpu**: `detections` runs with `runtime: nvidia` and `NVIDIA_VISIBLE_DEVICES=all`, exercises TensorRT path
+
+### Environment Variables (detections service)
+
+| Variable | Value | Purpose |
+|----------|-------|---------|
+| LOADER_URL | http://mock-loader:8080 | Points to mock Loader |
+| ANNOTATIONS_URL | http://mock-annotations:8081 | Points to mock Annotations |
+
+## Test Runner Configuration
+
+**Framework**: pytest
+**Plugins**: pytest-csv (reporting), requests (HTTP client), sseclient-py (SSE streaming), pytest-timeout (per-test timeouts)
+**Entry point**: `pytest --csv=/results/report.csv -v`
+
+### Fixture Strategy
+
+| Fixture | Scope | Purpose |
+|---------|-------|---------|
+| `base_url` | session | Detections service base URL (`http://detections:8000`) |
+| `http_client` | session | `requests.Session` configured with base URL and default timeout |
+| `sse_client_factory` | function | Factory that opens SSE connection to `/detect/stream` |
+| `mock_loader_url` | session | Mock-loader base URL for control API calls |
+| `mock_annotations_url` | session | Mock-annotations base URL for control API and assertion calls |
+| `wait_for_services` | session (autouse) | Polls health endpoints until all services are ready |
+| `reset_mocks` | function (autouse) | Calls `POST /mock/reset` on both mocks before each test |
+| `small_image` | session | Reads `small_image.jpg` from `/media/` volume |
+| `large_image` | session | Reads `large_image.jpg` from `/media/` volume |
+| `test_video_path` | session | Path to `test_video.mp4` on host filesystem |
+| `empty_image` | session | Reads zero-byte file |
+| `corrupt_image` | session | Reads random binary file |
+| `jwt_token` | function | Generates a valid JWT with exp claim for auth tests |
+| `warm_engine` | module | Sends one detection request to initialize engine, used by tests that need warm engine |
+
+## Test Data Fixtures
+
+| Data Set | Source | Format | Used By |
+|----------|--------|--------|---------|
+| azaion.onnx | Pre-built small YOLO model | ONNX (1280×1280 input, 19 classes) | All detection tests (via mock-loader) |
+| classes.json | Static fixture | JSON (19 objects with Id, Name, Color, MaxSizeM) | All tests (volume mount to detections) |
+| small_image.jpg | Static fixture | JPEG 640×480 | Health, single image, filtering, negative, performance tests |
+| large_image.jpg | Static fixture | JPEG 4000×3000 | Tiling tests, performance tests |
+| test_video.mp4 | Static fixture | MP4 10s 30fps | Async, SSE, video processing tests |
+| empty_image | Static fixture | Zero-byte file | FT-N-01 |
+| corrupt_image | Static fixture | Random binary | FT-N-02 |
+
+### Data Isolation
+
+Each test run starts with fresh containers (`docker compose down -v && docker compose up`). The detections service is stateless — no persistent data between runs. Mock services reset state via `POST /mock/reset` before each test. Tests that modify mock behavior (e.g., making loader unreachable) run with function-scoped mock resets.
+
+## Test Reporting
+
+**Format**: CSV
+**Columns**: Test ID, Test Name, Execution Time (ms), Result (PASS/FAIL/SKIP), Error Message (if FAIL)
+**Output path**: `/results/report.csv` → mounted to `./e2e-results/report.csv` on host
+
+## Acceptance Criteria
+
+**AC-1: Test environment starts**
+Given the docker-compose.test.yml
+When `docker compose -f docker-compose.test.yml up` is executed
+Then all services start and the detections service is reachable at http://detections:8000/health
+
+**AC-2: Mock services respond**
+Given the test environment is running
+When the e2e-consumer sends requests to mock-loader and mock-annotations
+Then mock services respond with configured behavior and record interactions
+
+**AC-3: Test runner executes**
+Given the test environment is running
+When the e2e-consumer starts
+Then pytest discovers and executes test files from `tests/` directory
+
+**AC-4: Test report generated**
+Given tests have been executed
+When the test run completes
+Then `/results/report.csv` exists with columns: Test ID, Test Name, Execution Time, Result, Error Message
@@ -0,0 +1,47 @@
+# Autopilot State
+
+## Current Step
+step: 2d
+name: Decompose Tests
+status: in_progress
+sub_step: 1t — Test Infrastructure Bootstrap
+
+## Step ↔ SubStep Reference
+| Step | Name                   | Sub-Skill                        | Internal SubSteps                        |
+|------|------------------------|----------------------------------|------------------------------------------|
+| 0    | Problem                | problem/SKILL.md                 | Phase 1–4                                |
+| 1    | Research               | research/SKILL.md                | Mode A: Phase 1–4 · Mode B: Step 0–8    |
+| 2    | Plan                   | plan/SKILL.md                    | Step 1–6                                 |
+| 2b   | Blackbox Test Spec     | blackbox-test-spec/SKILL.md      | Phase 1a–1b (existing code path only)    |
+| 2c   | Post-Test-Spec Decision| (autopilot decision gate)        | Refactor vs normal workflow              |
+| 2d   | Decompose Tests        | decompose/SKILL.md (tests-only)  | Step 1t + Step 3 + Step 4                |
+| 2e   | Implement Tests        | implement/SKILL.md               | (batch-driven, no fixed sub-steps)       |
+| 3    | Decompose              | decompose/SKILL.md               | Step 1–4                                 |
+| 4    | Implement              | implement/SKILL.md               | (batch-driven, no fixed sub-steps)       |
+| 5    | Deploy                 | deploy/SKILL.md                  | Step 1–7                                 |
+
+## Completed Steps
+
+| Step | Name | Completed | Key Outcome |
+|------|------|-----------|-------------|
+| — | Document (pre-step) | 2026-03-21 | 10 modules, 4 components, full _docs/ generated from existing codebase |
+| 2b | Blackbox Test Spec | 2026-03-21 | 39 test scenarios (16 positive, 8 negative, 11 non-functional), 85% total coverage, 5 artifacts produced |
+| 2c | Post-Test-Spec Decision | 2026-03-22 | User chose refactor path (A) |
+
+## Key Decisions
+- User chose B: Document existing codebase before proceeding
+- Component breakdown: 4 components (Domain, Inference Engines, Inference Pipeline, API)
+- Verification: 4 legacy issues found and documented (unused serialize/from_msgpack, orphaned queue declarations)
+- Input data coverage approved at ~90% (Phase 1a)
+- Test coverage approved at 85% (21/22 AC, 13/18 restrictions) with all gaps justified
+- User chose A: Refactor path (decompose tests → implement tests → refactor)
+- Integration Tests Epic: AZ-137
+
+## Last Session
+date: 2026-03-22
+ended_at: Step 2d Decompose Tests — SubStep 1t Test Infrastructure Bootstrap
+reason: in progress
+notes: Starting tests-only mode decomposition. 39 test scenarios to decompose into atomic tasks.
+
+## Blockers
+- none