detections/_docs/02_document/FINAL_report.md

# Azaion.Detections — Documentation Report

## Executive Summary

Azaion.Detections is a Python/Cython microservice for automated aerial object detection. It exposes a FastAPI HTTP API that accepts images and video, runs YOLO-based inference through TensorRT (GPU) or ONNX Runtime (CPU fallback), and returns structured detection results. The system supports large aerial image tiling with ground sampling distance-based sizing, real-time video processing with frame sampling and tracking heuristics, and Server-Sent Events streaming for live detection updates.

The codebase consists of 10 modules (2 Python, 8 Cython) organized into 4 components. It integrates with two external services: a Loader service for model storage and an Annotations service for result persistence. The system has no tests, no containerization config in this repo, and several legacy artifacts from a prior RabbitMQ-based architecture.

## Problem Statement

Automated detection of military and infrastructure objects (19 classes including vehicles, artillery, trenches, personnel, camouflage) from aerial imagery and video feeds. Replaces manual analyst review with real-time AI-powered detection, enabling rapid situational awareness for reconnaissance operations.

## Architecture Overview

**Tech stack**: Python 3 + Cython 3.1.3 | FastAPI + Uvicorn | ONNX Runtime 1.22.0 | TensorRT 10.11.0 | OpenCV 4.10.0 | NumPy 2.3.0

**Key architectural decisions**:
1. Cython for performance-critical inference loops
2. Dual engine strategy (TensorRT + ONNX fallback) with automatic conversion and caching
3. Lazy engine initialization for fast API startup
4. GSD-based image tiling for large aerial images

## Component Summary

| # | Component | Modules | Purpose | Dependencies |
|---|-----------|---------|---------|-------------|
| 01 | Domain | constants_inf, ai_config, ai_availability_status, annotation | Shared data models, enums, constants, logging, class registry | None (foundation) |
| 02 | Inference Engines | inference_engine, onnx_engine, tensorrt_engine | Pluggable ML inference backends (Strategy pattern) | Domain |
| 03 | Inference Pipeline | inference, loader_http_client | Engine lifecycle, media preprocessing/postprocessing, model loading | Domain, Engines |
| 04 | API | main | HTTP endpoints, SSE streaming, auth token management | Domain, Pipeline |

## System Flows

| # | Flow | Trigger | Description |
|---|------|---------|-------------|
| F1 | Health Check | GET /health | Returns AI engine availability status |
| F2 | Single Image Detection | POST /detect | Synchronous image inference, returns detections |
| F3 | Media Detection (Async) | POST /detect/{media_id} | Background processing with SSE streaming + Annotations posting |
| F4 | SSE Streaming | GET /detect/stream | Real-time event delivery to connected clients |
| F5 | Engine Initialization | First detection request | TensorRT → ONNX fallback → background conversion |
| F6 | TensorRT Conversion | No cached engine | Background ONNX→TensorRT conversion and upload |

## Risk Observations

| Risk | Severity | Source |
|------|----------|--------|
| No tests in the codebase | High | Verification (Step 4) |
| No CORS, rate limiting, or request size limits | Medium | Security review (main.py) |
| JWT token handled without signature verification | Medium | Security review (main.py) |
| Legacy unused code (serialize, from_msgpack, queue declarations) | Low | Verification (Step 4) |
| No graceful shutdown for in-progress detections | Medium | Architecture review |
| Single-instance in-memory state (_active_detections, _event_queues) | Medium | Scalability review |
| No Dockerfile or CI/CD config in this repository | Low | Infrastructure review |
| classes.json must exist at startup — no fallback | Low | Reliability review |
| Hardcoded 1280×1280 default for dynamic TensorRT dimensions | Low | Flexibility review |

## Open Questions

1. Where is the Dockerfile / docker-compose.yml for this service? Likely in a separate infrastructure repository.
2. Is the legacy RabbitMQ code (serialize methods, from_msgpack, queue constants in .pxd) planned for removal?
3. What is the intended scaling model — single instance per GPU, or horizontal scaling with shared state?
4. Should JWT signature verification be added at the detection service level, or is the current pass-through approach intentional?
5. Are there integration or end-to-end tests in a separate repository?

## Artifact Index

| Path | Description |
|------|-------------|
| `_docs/00_problem/problem.md` | Problem statement |
| `_docs/00_problem/restrictions.md` | System restrictions and constraints |
| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria |
| `_docs/00_problem/input_data/data_parameters.md` | Input data schemas and parameters |
| `_docs/01_solution/solution.md` | Solution description and assessment |
| `_docs/02_document/00_discovery.md` | Codebase discovery (tech stack, dependency graph) |
| `_docs/02_document/modules/*.md` | Per-module documentation (10 modules) |
| `_docs/02_document/components/01_domain/description.md` | Domain component spec |
| `_docs/02_document/components/02_inference_engines/description.md` | Inference Engines component spec |
| `_docs/02_document/components/03_inference_pipeline/description.md` | Inference Pipeline component spec |
| `_docs/02_document/components/04_api/description.md` | API component spec |
| `_docs/02_document/diagrams/components.md` | Component relationship diagram |
| `_docs/02_document/architecture.md` | System architecture document |
| `_docs/02_document/system-flows.md` | System flow diagrams and descriptions |
| `_docs/02_document/data_model.md` | Data model with ERD |
| `_docs/02_document/04_verification_log.md` | Verification pass results |
| `_docs/02_document/FINAL_report.md` | This report |
| `_docs/02_document/state.json` | Documentation process state |