Add detailed file index and enhance skill documentation for autopilot, decompose, deploy, plan, and research skills. Introduce tests-only mode in decompose skill, clarify required files for deploy and plan skills, and improve prerequisite checks across skills for better user guidance and workflow efficiency.

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-22 16:15:49 +02:00
parent 60ebe686ff
commit 3165a88f0b
60 changed files with 6324 additions and 1550 deletions
@@ -0,0 +1,129 @@
# Component: Inference Pipeline
## Overview
**Purpose**: Orchestrates the full inference lifecycle — engine initialization with fallback strategy, media preprocessing (images + video), batched inference execution, postprocessing with detection filtering, and result delivery via callbacks.
**Pattern**: Façade + Pipeline — `Inference` class is the single entry point that coordinates engine selection, preprocessing, inference, and postprocessing stages.
**Upstream**: Domain (data models, config, status), Inference Engines (OnnxEngine/TensorRTEngine), External Client (LoaderHttpClient).
**Downstream**: API (creates Inference, calls `run_detect` and `detect_single_image`).
## Modules
| Module | Role |
|--------|------|
| `inference` | Core orchestrator: engine lifecycle, preprocessing, postprocessing, image/video processing |
| `loader_http_client` | HTTP client for model download/upload from Loader service |
## Internal Interfaces
### Inference
```
cdef class Inference:
__init__(loader_client)
cpdef run_detect(dict config_dict, annotation_callback, status_callback=None)
cpdef list detect_single_image(bytes image_bytes, dict config_dict)
cpdef stop()
# Internal pipeline stages:
cdef init_ai()
cdef preprocess(frames) -> ndarray
cdef postprocess(output, ai_config) -> list[list[Detection]]
cdef remove_overlapping_detections(list[Detection], float threshold) -> list[Detection]
cdef _process_images(AIRecognitionConfig, list[str] paths)
cdef _process_video(AIRecognitionConfig, str video_name)
```
### LoaderHttpClient
```
class LoaderHttpClient:
load_big_small_resource(str filename, str directory) -> LoadResult
upload_big_small_resource(bytes content, str filename, str directory) -> LoadResult
```
## External API
None — internal component, consumed by API layer.
## Data Access Patterns
- Model bytes downloaded from Loader service (HTTP)
- Converted TensorRT engines uploaded back to Loader for caching
- Video frames read via OpenCV VideoCapture
- Images read via OpenCV imread
- All processing is in-memory
## Implementation Details
### Engine Initialization Strategy
```
1. Check GPU availability (pynvml, compute capability ≥ 6.1)
2. If GPU:
a. Try loading pre-built TensorRT engine from Loader
b. If fails → download ONNX model → start background conversion thread
c. Background thread: convert ONNX→TensorRT → upload to Loader → set _converted_model_bytes
d. Next init_ai() call: load from _converted_model_bytes
3. If no GPU:
a. Download ONNX model from Loader → create OnnxEngine
```
### Preprocessing
- `cv2.dnn.blobFromImage`: normalize 0..1, resize to model input, BGR→RGB
- Batch via `np.vstack`
### Postprocessing
- Parse `[batch][det][x1,y1,x2,y2,conf,cls]` output
- Normalize coordinates to 0..1
- Convert to center-format Detection objects
- Filter by confidence threshold
- Remove overlapping detections (greedy: keep higher confidence, tie-break by lower class_id)
### Large Image Tiling
- Ground Sampling Distance: `sensor_width * altitude / (focal_length * image_width)`
- Tile size: `METERS_IN_TILE / GSD` pixels
- Overlap: configurable percentage
- Tile deduplication: absolute-coordinate Detection equality across adjacent tiles
- Physical size filtering: remove detections exceeding class max_object_size_meters
### Video Processing
- Frame sampling: every Nth frame
- Annotation validity heuristics: time gap, detection count increase, spatial movement, confidence improvement
- JPEG encoding of valid frames for annotation images
### Callbacks
- `annotation_callback(annotation, percent)` — called per valid annotation
- `status_callback(media_name, count)` — called when all detections for a media item are complete
## Caveats
- `ThreadPoolExecutor` with max_workers=2 limits concurrent inference (set in main.py)
- Background TensorRT conversion runs in a daemon thread — may be interrupted on shutdown
- `init_ai()` called on every `run_detect` — idempotent but checks engine state each time
- Video processing is sequential per video (no parallel video processing)
- `_tile_detections` dict is instance-level state that persists across image calls within a single `run_detect` invocation
## Dependency Graph
```mermaid
graph TD
inference --> constants_inf
inference --> ai_availability_status
inference --> annotation
inference --> ai_config
inference -.-> onnx_engine
inference -.-> tensorrt_engine
inference --> loader_http_client
```
## Logging Strategy
Extensive logging via `constants_inf.log`: engine init status, media processing start, GSD calculation, tile splitting, detection results, size filtering decisions.