Add detailed file index and enhance skill documentation for autopilot, decompose, deploy, plan, and research skills. Introduce tests-only mode in decompose skill, clarify required files for deploy and plan skills, and improve prerequisite checks across skills for better user guidance and workflow efficiency.

2026-06-21 23:31:08 +00:00 · 2026-03-22 16:15:49 +02:00
parent 60ebe686ff
commit 3165a88f0b
60 changed files with 6324 additions and 1550 deletions
@@ -0,0 +1,129 @@
+# Component: Inference Pipeline
+
+## Overview
+
+**Purpose**: Orchestrates the full inference lifecycle — engine initialization with fallback strategy, media preprocessing (images + video), batched inference execution, postprocessing with detection filtering, and result delivery via callbacks.
+
+**Pattern**: Façade + Pipeline — `Inference` class is the single entry point that coordinates engine selection, preprocessing, inference, and postprocessing stages.
+
+**Upstream**: Domain (data models, config, status), Inference Engines (OnnxEngine/TensorRTEngine), External Client (LoaderHttpClient).
+**Downstream**: API (creates Inference, calls `run_detect` and `detect_single_image`).
+
+## Modules
+
+| Module | Role |
+|--------|------|
+| `inference` | Core orchestrator: engine lifecycle, preprocessing, postprocessing, image/video processing |
+| `loader_http_client` | HTTP client for model download/upload from Loader service |
+
+## Internal Interfaces
+
+### Inference
+
+```
+cdef class Inference:
+    __init__(loader_client)
+    cpdef run_detect(dict config_dict, annotation_callback, status_callback=None)
+    cpdef list detect_single_image(bytes image_bytes, dict config_dict)
+    cpdef stop()
+
+    # Internal pipeline stages:
+    cdef init_ai()
+    cdef preprocess(frames) -> ndarray
+    cdef postprocess(output, ai_config) -> list[list[Detection]]
+    cdef remove_overlapping_detections(list[Detection], float threshold) -> list[Detection]
+    cdef _process_images(AIRecognitionConfig, list[str] paths)
+    cdef _process_video(AIRecognitionConfig, str video_name)
+```
+
+### LoaderHttpClient
+
+```
+class LoaderHttpClient:
+    load_big_small_resource(str filename, str directory) -> LoadResult
+    upload_big_small_resource(bytes content, str filename, str directory) -> LoadResult
+```
+
+## External API
+
+None — internal component, consumed by API layer.
+
+## Data Access Patterns
+
+- Model bytes downloaded from Loader service (HTTP)
+- Converted TensorRT engines uploaded back to Loader for caching
+- Video frames read via OpenCV VideoCapture
+- Images read via OpenCV imread
+- All processing is in-memory
+
+## Implementation Details
+
+### Engine Initialization Strategy
+
+```
+1. Check GPU availability (pynvml, compute capability ≥ 6.1)
+2. If GPU:
+   a. Try loading pre-built TensorRT engine from Loader
+   b. If fails → download ONNX model → start background conversion thread
+   c. Background thread: convert ONNX→TensorRT → upload to Loader → set _converted_model_bytes
+   d. Next init_ai() call: load from _converted_model_bytes
+3. If no GPU:
+   a. Download ONNX model from Loader → create OnnxEngine
+```
+
+### Preprocessing
+
+- `cv2.dnn.blobFromImage`: normalize 0..1, resize to model input, BGR→RGB
+- Batch via `np.vstack`
+
+### Postprocessing
+
+- Parse `[batch][det][x1,y1,x2,y2,conf,cls]` output
+- Normalize coordinates to 0..1
+- Convert to center-format Detection objects
+- Filter by confidence threshold
+- Remove overlapping detections (greedy: keep higher confidence, tie-break by lower class_id)
+
+### Large Image Tiling
+
+- Ground Sampling Distance: `sensor_width * altitude / (focal_length * image_width)`
+- Tile size: `METERS_IN_TILE / GSD` pixels
+- Overlap: configurable percentage
+- Tile deduplication: absolute-coordinate Detection equality across adjacent tiles
+- Physical size filtering: remove detections exceeding class max_object_size_meters
+
+### Video Processing
+
+- Frame sampling: every Nth frame
+- Annotation validity heuristics: time gap, detection count increase, spatial movement, confidence improvement
+- JPEG encoding of valid frames for annotation images
+
+### Callbacks
+
+- `annotation_callback(annotation, percent)` — called per valid annotation
+- `status_callback(media_name, count)` — called when all detections for a media item are complete
+
+## Caveats
+
+- `ThreadPoolExecutor` with max_workers=2 limits concurrent inference (set in main.py)
+- Background TensorRT conversion runs in a daemon thread — may be interrupted on shutdown
+- `init_ai()` called on every `run_detect` — idempotent but checks engine state each time
+- Video processing is sequential per video (no parallel video processing)
+- `_tile_detections` dict is instance-level state that persists across image calls within a single `run_detect` invocation
+
+## Dependency Graph
+
+```mermaid
+graph TD
+    inference --> constants_inf
+    inference --> ai_availability_status
+    inference --> annotation
+    inference --> ai_config
+    inference -.-> onnx_engine
+    inference -.-> tensorrt_engine
+    inference --> loader_http_client
+```
+
+## Logging Strategy
+
+Extensive logging via `constants_inf.log`: engine init status, media processing start, GSD calculation, tile splitting, detection results, size filtering decisions.