Add detailed file index and enhance skill documentation for autopilot, decompose, deploy, plan, and research skills. Introduce tests-only mode in decompose skill, clarify required files for deploy and plan skills, and improve prerequisite checks across skills for better user guidance and workflow efficiency.

2026-06-21 16:11:07 +00:00 · 2026-03-22 16:15:49 +02:00
parent 60ebe686ff
commit 3165a88f0b
60 changed files with 6324 additions and 1550 deletions
@@ -0,0 +1,95 @@
+# Component: Domain Models & Configuration
+
+## Overview
+
+**Purpose**: Provides all data models, enums, constants, detection class registry, and logging infrastructure used across the system.
+
+**Pattern**: Shared kernel — leaf-level types and utilities consumed by all other components.
+
+**Upstream**: None (foundation layer).
+**Downstream**: Inference Engines, Inference Pipeline, API.
+
+## Modules
+
+| Module | Role |
+|--------|------|
+| `constants_inf` | Application constants, logging, detection class registry from `classes.json` |
+| `ai_config` | `AIRecognitionConfig` data class with factory methods |
+| `ai_availability_status` | Thread-safe `AIAvailabilityStatus` tracker with `AIAvailabilityEnum` |
+| `annotation` | `Detection` and `Annotation` data models |
+
+## Internal Interfaces
+
+### constants_inf
+
+```
+cdef log(str log_message) -> void
+cdef logerror(str error) -> void
+cdef format_time(int ms) -> str
+annotations_dict: dict[int, AnnotationClass]
+```
+
+### ai_config
+
+```
+cdef class AIRecognitionConfig:
+    @staticmethod cdef from_msgpack(bytes data) -> AIRecognitionConfig
+    @staticmethod def from_dict(dict data) -> AIRecognitionConfig
+```
+
+### ai_availability_status
+
+```
+cdef class AIAvailabilityStatus:
+    cdef set_status(AIAvailabilityEnum status, str error_message=None)
+    cdef bytes serialize()
+    # __str__ for display
+```
+
+### annotation
+
+```
+cdef class Detection:
+    cdef overlaps(Detection det2, float confidence_threshold) -> bool
+    # __eq__ for tile deduplication
+
+cdef class Annotation:
+    cdef bytes serialize()
+```
+
+## External API
+
+None — this is a shared kernel, not an externally-facing component.
+
+## Data Access Patterns
+
+- `classes.json` read once at module import time (constants_inf)
+- All data is in-memory, no database access
+
+## Implementation Details
+
+- Cython `cdef` classes for performance-critical detection processing
+- Thread-safe status tracking via `threading.Lock` in `AIAvailabilityStatus`
+- `Detection.__eq__` uses coordinate proximity threshold for tile deduplication
+- `Detection.overlaps` uses containment-biased metric (overlap / min_area) rather than standard IoU
+- Weather mode system triples the class registry (Norm/Wint/Night offsets of 0/20/40)
+
+## Caveats
+
+- `classes.json` must exist in the working directory at import time — no fallback
+- `Detection.__eq__` is designed specifically for tile deduplication, not general equality
+- `annotations_dict` is a module-level global — not injectable/configurable at runtime
+
+## Dependency Graph
+
+```mermaid
+graph TD
+    ai_availability_status --> constants_inf
+    annotation --> constants_inf
+    ai_config
+    constants_inf
+```
+
+## Logging Strategy
+
+All logging flows through `constants_inf.log` and `constants_inf.logerror`, which delegate to loguru with file rotation and console output.
@@ -0,0 +1,86 @@
+# Component: Inference Engines
+
+## Overview
+
+**Purpose**: Provides pluggable inference backends (ONNX Runtime and TensorRT) behind a common abstract interface, including ONNX-to-TensorRT model conversion.
+
+**Pattern**: Strategy pattern — `InferenceEngine` defines the contract; `OnnxEngine` and `TensorRTEngine` are interchangeable implementations.
+
+**Upstream**: Domain (constants_inf for logging).
+**Downstream**: Inference Pipeline (creates and uses engines).
+
+## Modules
+
+| Module | Role |
+|--------|------|
+| `inference_engine` | Abstract base class defining `get_input_shape`, `get_batch_size`, `run` |
+| `onnx_engine` | ONNX Runtime implementation (CPU/CUDA) |
+| `tensorrt_engine` | TensorRT implementation (GPU) + ONNX→TensorRT converter |
+
+## Internal Interfaces
+
+### InferenceEngine (abstract)
+
+```
+cdef class InferenceEngine:
+    __init__(bytes model_bytes, int batch_size=1, **kwargs)
+    cdef tuple get_input_shape()       # -> (height, width)
+    cdef int get_batch_size()          # -> batch_size
+    cdef run(input_data)               # -> list of output tensors
+```
+
+### OnnxEngine
+
+```
+cdef class OnnxEngine(InferenceEngine):
+    # Implements all base methods
+    # Provider priority: CUDA > CPU
+```
+
+### TensorRTEngine
+
+```
+cdef class TensorRTEngine(InferenceEngine):
+    # Implements all base methods
+    @staticmethod get_gpu_memory_bytes(int device_id) -> int
+    @staticmethod get_engine_filename(int device_id) -> str
+    @staticmethod convert_from_onnx(bytes onnx_model) -> bytes or None
+```
+
+## External API
+
+None — internal component consumed by Inference Pipeline.
+
+## Data Access Patterns
+
+- Model bytes loaded in-memory (provided by caller)
+- TensorRT: CUDA device memory allocated at init, async H2D/D2H transfers during inference
+- ONNX: managed by onnxruntime internally
+
+## Implementation Details
+
+- **OnnxEngine**: default batch_size=1; loads model into `onnxruntime.InferenceSession`
+- **TensorRTEngine**: default batch_size=4; dynamic dimensions default to 1280×1280 input, 300 max detections
+- **Model conversion**: `convert_from_onnx` uses 90% of GPU memory as workspace, enables FP16 if hardware supports it
+- **Engine filename**: GPU-specific (`azaion.cc_{major}.{minor}_sm_{count}.engine`) — allows pre-built engine caching per GPU architecture
+- Output format: `[batch][detection_index][x1, y1, x2, y2, confidence, class_id]`
+
+## Caveats
+
+- TensorRT engine files are GPU-architecture-specific and not portable
+- `pycuda.autoinit` import is required as side-effect (initializes CUDA context)
+- Dynamic shapes defaulting to 1280×1280 is hardcoded — not configurable
+
+## Dependency Graph
+
+```mermaid
+graph TD
+    onnx_engine --> inference_engine
+    onnx_engine --> constants_inf
+    tensorrt_engine --> inference_engine
+    tensorrt_engine --> constants_inf
+```
+
+## Logging Strategy
+
+Logs model metadata at init and conversion progress/errors via `constants_inf.log`/`logerror`.
@@ -0,0 +1,129 @@
+# Component: Inference Pipeline
+
+## Overview
+
+**Purpose**: Orchestrates the full inference lifecycle — engine initialization with fallback strategy, media preprocessing (images + video), batched inference execution, postprocessing with detection filtering, and result delivery via callbacks.
+
+**Pattern**: Façade + Pipeline — `Inference` class is the single entry point that coordinates engine selection, preprocessing, inference, and postprocessing stages.
+
+**Upstream**: Domain (data models, config, status), Inference Engines (OnnxEngine/TensorRTEngine), External Client (LoaderHttpClient).
+**Downstream**: API (creates Inference, calls `run_detect` and `detect_single_image`).
+
+## Modules
+
+| Module | Role |
+|--------|------|
+| `inference` | Core orchestrator: engine lifecycle, preprocessing, postprocessing, image/video processing |
+| `loader_http_client` | HTTP client for model download/upload from Loader service |
+
+## Internal Interfaces
+
+### Inference
+
+```
+cdef class Inference:
+    __init__(loader_client)
+    cpdef run_detect(dict config_dict, annotation_callback, status_callback=None)
+    cpdef list detect_single_image(bytes image_bytes, dict config_dict)
+    cpdef stop()
+
+    # Internal pipeline stages:
+    cdef init_ai()
+    cdef preprocess(frames) -> ndarray
+    cdef postprocess(output, ai_config) -> list[list[Detection]]
+    cdef remove_overlapping_detections(list[Detection], float threshold) -> list[Detection]
+    cdef _process_images(AIRecognitionConfig, list[str] paths)
+    cdef _process_video(AIRecognitionConfig, str video_name)
+```
+
+### LoaderHttpClient
+
+```
+class LoaderHttpClient:
+    load_big_small_resource(str filename, str directory) -> LoadResult
+    upload_big_small_resource(bytes content, str filename, str directory) -> LoadResult
+```
+
+## External API
+
+None — internal component, consumed by API layer.
+
+## Data Access Patterns
+
+- Model bytes downloaded from Loader service (HTTP)
+- Converted TensorRT engines uploaded back to Loader for caching
+- Video frames read via OpenCV VideoCapture
+- Images read via OpenCV imread
+- All processing is in-memory
+
+## Implementation Details
+
+### Engine Initialization Strategy
+
+```
+1. Check GPU availability (pynvml, compute capability ≥ 6.1)
+2. If GPU:
+   a. Try loading pre-built TensorRT engine from Loader
+   b. If fails → download ONNX model → start background conversion thread
+   c. Background thread: convert ONNX→TensorRT → upload to Loader → set _converted_model_bytes
+   d. Next init_ai() call: load from _converted_model_bytes
+3. If no GPU:
+   a. Download ONNX model from Loader → create OnnxEngine
+```
+
+### Preprocessing
+
+- `cv2.dnn.blobFromImage`: normalize 0..1, resize to model input, BGR→RGB
+- Batch via `np.vstack`
+
+### Postprocessing
+
+- Parse `[batch][det][x1,y1,x2,y2,conf,cls]` output
+- Normalize coordinates to 0..1
+- Convert to center-format Detection objects
+- Filter by confidence threshold
+- Remove overlapping detections (greedy: keep higher confidence, tie-break by lower class_id)
+
+### Large Image Tiling
+
+- Ground Sampling Distance: `sensor_width * altitude / (focal_length * image_width)`
+- Tile size: `METERS_IN_TILE / GSD` pixels
+- Overlap: configurable percentage
+- Tile deduplication: absolute-coordinate Detection equality across adjacent tiles
+- Physical size filtering: remove detections exceeding class max_object_size_meters
+
+### Video Processing
+
+- Frame sampling: every Nth frame
+- Annotation validity heuristics: time gap, detection count increase, spatial movement, confidence improvement
+- JPEG encoding of valid frames for annotation images
+
+### Callbacks
+
+- `annotation_callback(annotation, percent)` — called per valid annotation
+- `status_callback(media_name, count)` — called when all detections for a media item are complete
+
+## Caveats
+
+- `ThreadPoolExecutor` with max_workers=2 limits concurrent inference (set in main.py)
+- Background TensorRT conversion runs in a daemon thread — may be interrupted on shutdown
+- `init_ai()` called on every `run_detect` — idempotent but checks engine state each time
+- Video processing is sequential per video (no parallel video processing)
+- `_tile_detections` dict is instance-level state that persists across image calls within a single `run_detect` invocation
+
+## Dependency Graph
+
+```mermaid
+graph TD
+    inference --> constants_inf
+    inference --> ai_availability_status
+    inference --> annotation
+    inference --> ai_config
+    inference -.-> onnx_engine
+    inference -.-> tensorrt_engine
+    inference --> loader_http_client
+```
+
+## Logging Strategy
+
+Extensive logging via `constants_inf.log`: engine init status, media processing start, GSD calculation, tile splitting, detection results, size filtering decisions.
@@ -0,0 +1,103 @@
+# Component: API
+
+## Overview
+
+**Purpose**: HTTP API layer exposing object detection capabilities via FastAPI — handles request/response serialization, async task management, SSE streaming, and authentication token forwarding.
+
+**Pattern**: Controller layer — thin API surface that delegates all business logic to the Inference Pipeline.
+
+**Upstream**: Inference Pipeline (Inference class), Domain (constants_inf for labels).
+**Downstream**: None (top-level, client-facing).
+
+## Modules
+
+| Module | Role |
+|--------|------|
+| `main` | FastAPI app definition, endpoints, DTOs, TokenManager, SSE streaming |
+
+## External API Specification
+
+### GET /health
+
+**Response**: `HealthResponse`
+```json
+{
+  "status": "healthy",
+  "aiAvailability": "Enabled",
+  "errorMessage": null
+}
+```
+`aiAvailability` values: None, Downloading, Converting, Uploading, Enabled, Warning, Error.
+
+### POST /detect
+
+**Input**: Multipart form — `file` (image bytes), optional `config` (JSON string).
+**Response**: `list[DetectionDto]`
+```json
+[
+  {
+    "centerX": 0.5,
+    "centerY": 0.5,
+    "width": 0.1,
+    "height": 0.1,
+    "classNum": 0,
+    "label": "ArmorVehicle",
+    "confidence": 0.85
+  }
+]
+```
+**Errors**: 400 (empty image / invalid data), 422 (runtime error), 503 (engine unavailable).
+
+### POST /detect/{media_id}
+
+**Input**: Path param `media_id`, optional JSON body `AIConfigDto`, headers `Authorization: Bearer {token}`, `x-refresh-token: {token}`.
+**Response**: `{"status": "started", "mediaId": "..."}` (202-style).
+**Errors**: 409 (duplicate detection for same media_id).
+**Side effects**: Starts async detection task; results delivered via SSE stream and/or posted to Annotations service.
+
+### GET /detect/stream
+
+**Response**: `text/event-stream` (SSE).
+```
+data: {"annotations": [...], "mediaId": "...", "mediaStatus": "AIProcessing", "mediaPercent": 50}
+```
+`mediaStatus` values: AIProcessing, AIProcessed, Error.
+
+## Data Access Patterns
+
+- In-memory state:
+  - `_active_detections: dict[str, bool]` — guards against duplicate media processing
+  - `_event_queues: list[asyncio.Queue]` — SSE client queues (maxsize=100)
+- No database access
+
+## Implementation Details
+
+- `Inference` is lazy-loaded on first use via `get_inference()` global function
+- `ThreadPoolExecutor(max_workers=2)` runs inference off the async event loop
+- SSE: one `asyncio.Queue` per connected client; events broadcast to all queues; full queues silently drop events
+- `TokenManager` decodes JWT exp from base64 payload (no signature verification), auto-refreshes 60s before expiry
+- `detection_to_dto` maps Detection fields to DetectionDto, looks up label from `constants_inf.annotations_dict`
+- Annotations posted to external service with base64-encoded frame image
+
+## Caveats
+
+- No CORS middleware configured
+- No rate limiting
+- No request body size limits beyond FastAPI defaults
+- `_active_detections` is an in-memory dict — not persistent across restarts, not distributed
+- SSE queue overflow silently drops events (QueueFull caught and ignored)
+- JWT token handling has no signature verification — relies entirely on the Annotations service for auth
+- No graceful shutdown handling for in-progress detections
+
+## Dependency Graph
+
+```mermaid
+graph TD
+    main --> inference
+    main --> constants_inf
+    main --> loader_http_client
+```
+
+## Logging Strategy
+
+No explicit logging in main.py — errors are caught and returned as HTTP responses. Logging happens in downstream components.