# Component: Inference Pipeline

## Overview

**Purpose**: Orchestrates the full inference lifecycle — engine initialization with fallback strategy, media preprocessing (images + video), batched inference execution, postprocessing with detection filtering, and result delivery via callbacks.

**Pattern**: Façade + Pipeline — `Inference` class is the single entry point that coordinates engine selection, preprocessing, inference, and postprocessing stages.

**Upstream**: Domain (data models, config, status), Inference Engines (OnnxEngine/TensorRTEngine), External Client (LoaderHttpClient).
**Downstream**: API (creates Inference, calls `run_detect` and `detect_single_image`).

## Modules

| Module | Role |
|--------|------|
| `inference` | Core orchestrator: engine lifecycle, preprocessing, postprocessing, image/video processing |
| `loader_http_client` | HTTP client for model download/upload from Loader service |

## Internal Interfaces

### Inference

```
cdef class Inference:
    __init__(loader_client)
    cpdef run_detect(dict config_dict, annotation_callback, status_callback=None)
    cpdef list detect_single_image(bytes image_bytes, dict config_dict)
    cpdef stop()

    # Internal pipeline stages:
    cdef init_ai()
    cdef preprocess(frames) -> ndarray
    cdef postprocess(output, ai_config) -> list[list[Detection]]
    cdef remove_overlapping_detections(list[Detection], float threshold) -> list[Detection]
    cdef _process_images(AIRecognitionConfig, list[str] paths)
    cdef _process_video(AIRecognitionConfig, str video_name)
```

### LoaderHttpClient

```
class LoaderHttpClient:
    load_big_small_resource(str filename, str directory) -> LoadResult
    upload_big_small_resource(bytes content, str filename, str directory) -> LoadResult
```

## External API

None — internal component, consumed by API layer.

## Data Access Patterns

- Model bytes downloaded from Loader service (HTTP)
- Converted TensorRT engines uploaded back to Loader for caching
- Video frames read via OpenCV VideoCapture
- Images read via OpenCV imread
- All processing is in-memory

## Implementation Details

### Engine Initialization Strategy

```
1. Check GPU availability (pynvml, compute capability ≥ 6.1)
2. If GPU:
   a. Try loading pre-built TensorRT engine from Loader
   b. If fails → download ONNX model → start background conversion thread
   c. Background thread: convert ONNX→TensorRT → upload to Loader → set _converted_model_bytes
   d. Next init_ai() call: load from _converted_model_bytes
3. If no GPU:
   a. Download ONNX model from Loader → create OnnxEngine
```

### Preprocessing

- `cv2.dnn.blobFromImage`: normalize 0..1, resize to model input, BGR→RGB
- Batch via `np.vstack`

### Postprocessing

- Parse `[batch][det][x1,y1,x2,y2,conf,cls]` output
- Normalize coordinates to 0..1
- Convert to center-format Detection objects
- Filter by confidence threshold
- Remove overlapping detections (greedy: keep higher confidence, tie-break by lower class_id)

### Large Image Tiling

- Ground Sampling Distance: `sensor_width * altitude / (focal_length * image_width)`
- Tile size: `METERS_IN_TILE / GSD` pixels
- Overlap: configurable percentage
- Tile deduplication: absolute-coordinate Detection equality across adjacent tiles
- Physical size filtering: remove detections exceeding class max_object_size_meters

### Video Processing

- Frame sampling: every Nth frame
- Annotation validity heuristics: time gap, detection count increase, spatial movement, confidence improvement
- JPEG encoding of valid frames for annotation images

### Callbacks

- `annotation_callback(annotation, percent)` — called per valid annotation
- `status_callback(media_name, count)` — called when all detections for a media item are complete

## Caveats

- `ThreadPoolExecutor` with max_workers=2 limits concurrent inference (set in main.py)
- Background TensorRT conversion runs in a daemon thread — may be interrupted on shutdown
- `init_ai()` called on every `run_detect` — idempotent but checks engine state each time
- Video processing is sequential per video (no parallel video processing)
- `_tile_detections` dict is instance-level state that persists across image calls within a single `run_detect` invocation

## Dependency Graph

```mermaid
graph TD
    inference --> constants_inf
    inference --> ai_availability_status
    inference --> annotation
    inference --> ai_config
    inference -.-> onnx_engine
    inference -.-> tensorrt_engine
    inference --> loader_http_client
```

## Logging Strategy

Extensive logging via `constants_inf.log`: engine init status, media processing start, GSD calculation, tile splitting, detection results, size filtering decisions.