Add detailed file index and enhance skill documentation for autopilot, decompose, deploy, plan, and research skills. Introduce tests-only mode in decompose skill, clarify required files for deploy and plan skills, and improve prerequisite checks across skills for better user guidance and workflow efficiency.

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-22 16:15:49 +02:00
parent 60ebe686ff
commit 3165a88f0b
60 changed files with 6324 additions and 1550 deletions
@@ -0,0 +1,95 @@
# Component: Domain Models & Configuration
## Overview
**Purpose**: Provides all data models, enums, constants, detection class registry, and logging infrastructure used across the system.
**Pattern**: Shared kernel — leaf-level types and utilities consumed by all other components.
**Upstream**: None (foundation layer).
**Downstream**: Inference Engines, Inference Pipeline, API.
## Modules
| Module | Role |
|--------|------|
| `constants_inf` | Application constants, logging, detection class registry from `classes.json` |
| `ai_config` | `AIRecognitionConfig` data class with factory methods |
| `ai_availability_status` | Thread-safe `AIAvailabilityStatus` tracker with `AIAvailabilityEnum` |
| `annotation` | `Detection` and `Annotation` data models |
## Internal Interfaces
### constants_inf
```
cdef log(str log_message) -> void
cdef logerror(str error) -> void
cdef format_time(int ms) -> str
annotations_dict: dict[int, AnnotationClass]
```
### ai_config
```
cdef class AIRecognitionConfig:
@staticmethod cdef from_msgpack(bytes data) -> AIRecognitionConfig
@staticmethod def from_dict(dict data) -> AIRecognitionConfig
```
### ai_availability_status
```
cdef class AIAvailabilityStatus:
cdef set_status(AIAvailabilityEnum status, str error_message=None)
cdef bytes serialize()
# __str__ for display
```
### annotation
```
cdef class Detection:
cdef overlaps(Detection det2, float confidence_threshold) -> bool
# __eq__ for tile deduplication
cdef class Annotation:
cdef bytes serialize()
```
## External API
None — this is a shared kernel, not an externally-facing component.
## Data Access Patterns
- `classes.json` read once at module import time (constants_inf)
- All data is in-memory, no database access
## Implementation Details
- Cython `cdef` classes for performance-critical detection processing
- Thread-safe status tracking via `threading.Lock` in `AIAvailabilityStatus`
- `Detection.__eq__` uses coordinate proximity threshold for tile deduplication
- `Detection.overlaps` uses containment-biased metric (overlap / min_area) rather than standard IoU
- Weather mode system triples the class registry (Norm/Wint/Night offsets of 0/20/40)
## Caveats
- `classes.json` must exist in the working directory at import time — no fallback
- `Detection.__eq__` is designed specifically for tile deduplication, not general equality
- `annotations_dict` is a module-level global — not injectable/configurable at runtime
## Dependency Graph
```mermaid
graph TD
ai_availability_status --> constants_inf
annotation --> constants_inf
ai_config
constants_inf
```
## Logging Strategy
All logging flows through `constants_inf.log` and `constants_inf.logerror`, which delegate to loguru with file rotation and console output.
@@ -0,0 +1,86 @@
# Component: Inference Engines
## Overview
**Purpose**: Provides pluggable inference backends (ONNX Runtime and TensorRT) behind a common abstract interface, including ONNX-to-TensorRT model conversion.
**Pattern**: Strategy pattern — `InferenceEngine` defines the contract; `OnnxEngine` and `TensorRTEngine` are interchangeable implementations.
**Upstream**: Domain (constants_inf for logging).
**Downstream**: Inference Pipeline (creates and uses engines).
## Modules
| Module | Role |
|--------|------|
| `inference_engine` | Abstract base class defining `get_input_shape`, `get_batch_size`, `run` |
| `onnx_engine` | ONNX Runtime implementation (CPU/CUDA) |
| `tensorrt_engine` | TensorRT implementation (GPU) + ONNX→TensorRT converter |
## Internal Interfaces
### InferenceEngine (abstract)
```
cdef class InferenceEngine:
__init__(bytes model_bytes, int batch_size=1, **kwargs)
cdef tuple get_input_shape() # -> (height, width)
cdef int get_batch_size() # -> batch_size
cdef run(input_data) # -> list of output tensors
```
### OnnxEngine
```
cdef class OnnxEngine(InferenceEngine):
# Implements all base methods
# Provider priority: CUDA > CPU
```
### TensorRTEngine
```
cdef class TensorRTEngine(InferenceEngine):
# Implements all base methods
@staticmethod get_gpu_memory_bytes(int device_id) -> int
@staticmethod get_engine_filename(int device_id) -> str
@staticmethod convert_from_onnx(bytes onnx_model) -> bytes or None
```
## External API
None — internal component consumed by Inference Pipeline.
## Data Access Patterns
- Model bytes loaded in-memory (provided by caller)
- TensorRT: CUDA device memory allocated at init, async H2D/D2H transfers during inference
- ONNX: managed by onnxruntime internally
## Implementation Details
- **OnnxEngine**: default batch_size=1; loads model into `onnxruntime.InferenceSession`
- **TensorRTEngine**: default batch_size=4; dynamic dimensions default to 1280×1280 input, 300 max detections
- **Model conversion**: `convert_from_onnx` uses 90% of GPU memory as workspace, enables FP16 if hardware supports it
- **Engine filename**: GPU-specific (`azaion.cc_{major}.{minor}_sm_{count}.engine`) — allows pre-built engine caching per GPU architecture
- Output format: `[batch][detection_index][x1, y1, x2, y2, confidence, class_id]`
## Caveats
- TensorRT engine files are GPU-architecture-specific and not portable
- `pycuda.autoinit` import is required as side-effect (initializes CUDA context)
- Dynamic shapes defaulting to 1280×1280 is hardcoded — not configurable
## Dependency Graph
```mermaid
graph TD
onnx_engine --> inference_engine
onnx_engine --> constants_inf
tensorrt_engine --> inference_engine
tensorrt_engine --> constants_inf
```
## Logging Strategy
Logs model metadata at init and conversion progress/errors via `constants_inf.log`/`logerror`.
@@ -0,0 +1,129 @@
# Component: Inference Pipeline
## Overview
**Purpose**: Orchestrates the full inference lifecycle — engine initialization with fallback strategy, media preprocessing (images + video), batched inference execution, postprocessing with detection filtering, and result delivery via callbacks.
**Pattern**: Façade + Pipeline — `Inference` class is the single entry point that coordinates engine selection, preprocessing, inference, and postprocessing stages.
**Upstream**: Domain (data models, config, status), Inference Engines (OnnxEngine/TensorRTEngine), External Client (LoaderHttpClient).
**Downstream**: API (creates Inference, calls `run_detect` and `detect_single_image`).
## Modules
| Module | Role |
|--------|------|
| `inference` | Core orchestrator: engine lifecycle, preprocessing, postprocessing, image/video processing |
| `loader_http_client` | HTTP client for model download/upload from Loader service |
## Internal Interfaces
### Inference
```
cdef class Inference:
__init__(loader_client)
cpdef run_detect(dict config_dict, annotation_callback, status_callback=None)
cpdef list detect_single_image(bytes image_bytes, dict config_dict)
cpdef stop()
# Internal pipeline stages:
cdef init_ai()
cdef preprocess(frames) -> ndarray
cdef postprocess(output, ai_config) -> list[list[Detection]]
cdef remove_overlapping_detections(list[Detection], float threshold) -> list[Detection]
cdef _process_images(AIRecognitionConfig, list[str] paths)
cdef _process_video(AIRecognitionConfig, str video_name)
```
### LoaderHttpClient
```
class LoaderHttpClient:
load_big_small_resource(str filename, str directory) -> LoadResult
upload_big_small_resource(bytes content, str filename, str directory) -> LoadResult
```
## External API
None — internal component, consumed by API layer.
## Data Access Patterns
- Model bytes downloaded from Loader service (HTTP)
- Converted TensorRT engines uploaded back to Loader for caching
- Video frames read via OpenCV VideoCapture
- Images read via OpenCV imread
- All processing is in-memory
## Implementation Details
### Engine Initialization Strategy
```
1. Check GPU availability (pynvml, compute capability ≥ 6.1)
2. If GPU:
a. Try loading pre-built TensorRT engine from Loader
b. If fails → download ONNX model → start background conversion thread
c. Background thread: convert ONNX→TensorRT → upload to Loader → set _converted_model_bytes
d. Next init_ai() call: load from _converted_model_bytes
3. If no GPU:
a. Download ONNX model from Loader → create OnnxEngine
```
### Preprocessing
- `cv2.dnn.blobFromImage`: normalize 0..1, resize to model input, BGR→RGB
- Batch via `np.vstack`
### Postprocessing
- Parse `[batch][det][x1,y1,x2,y2,conf,cls]` output
- Normalize coordinates to 0..1
- Convert to center-format Detection objects
- Filter by confidence threshold
- Remove overlapping detections (greedy: keep higher confidence, tie-break by lower class_id)
### Large Image Tiling
- Ground Sampling Distance: `sensor_width * altitude / (focal_length * image_width)`
- Tile size: `METERS_IN_TILE / GSD` pixels
- Overlap: configurable percentage
- Tile deduplication: absolute-coordinate Detection equality across adjacent tiles
- Physical size filtering: remove detections exceeding class max_object_size_meters
### Video Processing
- Frame sampling: every Nth frame
- Annotation validity heuristics: time gap, detection count increase, spatial movement, confidence improvement
- JPEG encoding of valid frames for annotation images
### Callbacks
- `annotation_callback(annotation, percent)` — called per valid annotation
- `status_callback(media_name, count)` — called when all detections for a media item are complete
## Caveats
- `ThreadPoolExecutor` with max_workers=2 limits concurrent inference (set in main.py)
- Background TensorRT conversion runs in a daemon thread — may be interrupted on shutdown
- `init_ai()` called on every `run_detect` — idempotent but checks engine state each time
- Video processing is sequential per video (no parallel video processing)
- `_tile_detections` dict is instance-level state that persists across image calls within a single `run_detect` invocation
## Dependency Graph
```mermaid
graph TD
inference --> constants_inf
inference --> ai_availability_status
inference --> annotation
inference --> ai_config
inference -.-> onnx_engine
inference -.-> tensorrt_engine
inference --> loader_http_client
```
## Logging Strategy
Extensive logging via `constants_inf.log`: engine init status, media processing start, GSD calculation, tile splitting, detection results, size filtering decisions.
@@ -0,0 +1,103 @@
# Component: API
## Overview
**Purpose**: HTTP API layer exposing object detection capabilities via FastAPI — handles request/response serialization, async task management, SSE streaming, and authentication token forwarding.
**Pattern**: Controller layer — thin API surface that delegates all business logic to the Inference Pipeline.
**Upstream**: Inference Pipeline (Inference class), Domain (constants_inf for labels).
**Downstream**: None (top-level, client-facing).
## Modules
| Module | Role |
|--------|------|
| `main` | FastAPI app definition, endpoints, DTOs, TokenManager, SSE streaming |
## External API Specification
### GET /health
**Response**: `HealthResponse`
```json
{
"status": "healthy",
"aiAvailability": "Enabled",
"errorMessage": null
}
```
`aiAvailability` values: None, Downloading, Converting, Uploading, Enabled, Warning, Error.
### POST /detect
**Input**: Multipart form — `file` (image bytes), optional `config` (JSON string).
**Response**: `list[DetectionDto]`
```json
[
{
"centerX": 0.5,
"centerY": 0.5,
"width": 0.1,
"height": 0.1,
"classNum": 0,
"label": "ArmorVehicle",
"confidence": 0.85
}
]
```
**Errors**: 400 (empty image / invalid data), 422 (runtime error), 503 (engine unavailable).
### POST /detect/{media_id}
**Input**: Path param `media_id`, optional JSON body `AIConfigDto`, headers `Authorization: Bearer {token}`, `x-refresh-token: {token}`.
**Response**: `{"status": "started", "mediaId": "..."}` (202-style).
**Errors**: 409 (duplicate detection for same media_id).
**Side effects**: Starts async detection task; results delivered via SSE stream and/or posted to Annotations service.
### GET /detect/stream
**Response**: `text/event-stream` (SSE).
```
data: {"annotations": [...], "mediaId": "...", "mediaStatus": "AIProcessing", "mediaPercent": 50}
```
`mediaStatus` values: AIProcessing, AIProcessed, Error.
## Data Access Patterns
- In-memory state:
- `_active_detections: dict[str, bool]` — guards against duplicate media processing
- `_event_queues: list[asyncio.Queue]` — SSE client queues (maxsize=100)
- No database access
## Implementation Details
- `Inference` is lazy-loaded on first use via `get_inference()` global function
- `ThreadPoolExecutor(max_workers=2)` runs inference off the async event loop
- SSE: one `asyncio.Queue` per connected client; events broadcast to all queues; full queues silently drop events
- `TokenManager` decodes JWT exp from base64 payload (no signature verification), auto-refreshes 60s before expiry
- `detection_to_dto` maps Detection fields to DetectionDto, looks up label from `constants_inf.annotations_dict`
- Annotations posted to external service with base64-encoded frame image
## Caveats
- No CORS middleware configured
- No rate limiting
- No request body size limits beyond FastAPI defaults
- `_active_detections` is an in-memory dict — not persistent across restarts, not distributed
- SSE queue overflow silently drops events (QueueFull caught and ignored)
- JWT token handling has no signature verification — relies entirely on the Annotations service for auth
- No graceful shutdown handling for in-progress detections
## Dependency Graph
```mermaid
graph TD
main --> inference
main --> constants_inf
main --> loader_http_client
```
## Logging Strategy
No explicit logging in main.py — errors are caught and returned as HTTP responses. Logging happens in downstream components.