Files
detections/_docs/02_document/architecture.md
T
Oleksandr Bezdieniezhnykh 8ce40a9385 Add AIAvailabilityStatus and AIRecognitionConfig classes for AI model management
- Introduced `AIAvailabilityStatus` class to manage the availability status of AI models, including methods for setting status and logging messages.
- Added `AIRecognitionConfig` class to encapsulate configuration parameters for AI recognition, with a static method for creating instances from dictionaries.
- Implemented enums for AI availability states to enhance clarity and maintainability.
- Updated related Cython files to support the new classes and ensure proper type handling.

These changes aim to improve the structure and functionality of the AI model management system, facilitating better status tracking and configuration handling.
2026-03-31 05:49:51 +03:00

9.5 KiB
Raw Blame History

Azaion.Detections — Architecture

1. System Context

Problem being solved: Automated object detection on aerial imagery and video — identifying military and infrastructure objects (vehicles, artillery, trenches, personnel, etc.) from drone/satellite feeds and returning structured detection results with bounding boxes, class labels, and confidence scores.

System boundaries:

  • Inside: FastAPI HTTP service, Cython-based inference pipeline, ONNX/TensorRT inference engines, image tiling, video frame processing, detection postprocessing
  • Outside: Loader service (model storage), Annotations service (result persistence + auth), client applications

External systems:

System Integration Type Direction Purpose
Loader Service REST (HTTP) Both Download AI models, upload converted TensorRT engines
Annotations Service REST (HTTP) Outbound Post detection results, refresh auth tokens
Client Applications REST + SSE Inbound Submit detection requests, receive streaming results

2. Technology Stack

Layer Technology Version Rationale
Language Python 3 + Cython 3.1.3 (Cython) Python for API, Cython for performance-critical inference loops
Framework FastAPI + Uvicorn latest Async HTTP + SSE support
ML Runtime (CPU) ONNX Runtime 1.22.0 Portable model format, CPU/CUDA provider fallback
ML Runtime (GPU) TensorRT + PyCUDA 10.11.0 / 2025.1.1 Maximum GPU inference performance
Image Processing OpenCV 4.10.0 Frame decoding, preprocessing, tiling
HTTP Client requests 2.32.4 Synchronous HTTP to Loader and Annotations services
Logging loguru 0.7.3 Structured file + console logging
GPU Monitoring pynvml 12.0.0 GPU detection, capability checks, memory queries
Numeric NumPy 2.3.0 Tensor manipulation

3. Deployment Model

Infrastructure: Containerized microservice, deployed alongside Loader and Annotations services (likely Docker Compose or Kubernetes given service discovery by hostname).

Environment-specific configuration:

Config Development Production
LOADER_URL http://loader:8080 (default) Environment variable
ANNOTATIONS_URL http://annotations:8080 (default) Environment variable
GPU Optional (falls back to ONNX CPU) Required (TensorRT)
Logging Console + file File (Logs/log_inference_YYYYMMDD.txt, 30-day retention)

4. Data Model Overview

Core entities:

Entity Description Owned By Component
AnnotationClass Detection class metadata (name, color, max physical size) 01 Domain
Detection Single bounding box with class + confidence 01 Domain
Annotation Collection of detections for one frame/tile + image 01 Domain
AIRecognitionConfig Runtime inference parameters 01 Domain
AIAvailabilityStatus Engine lifecycle state 01 Domain
DetectionDto API-facing detection response 04 API
DetectionEvent SSE event payload 04 API

Key relationships:

  • Annotation → Detection: one-to-many (detections within a frame/tile)
  • Detection → AnnotationClass: many-to-one (via class ID lookup in annotations_dict)
  • Annotation → Media: many-to-one (multiple annotations per video/image)

Data flow summary:

  • Media bytes → Preprocessing → Engine → Raw output → Postprocessing → Detection/Annotation → DTO → HTTP/SSE response
  • ONNX model bytes → Loader → Engine init (or TensorRT conversion → upload back to Loader)

5. Integration Points

Internal Communication

From To Protocol Pattern Notes
API Inference Pipeline Direct Python call Sync (via ThreadPoolExecutor) Lazy initialization
Inference Pipeline Inference Engines Direct Cython call Sync Strategy pattern selection
Inference Pipeline Loader HTTP POST Request-Response Model download/upload

External Integrations

External System Protocol Auth Rate Limits Failure Mode
Loader Service HTTP POST None None observed Exception → LoadResult(err)
Annotations Service HTTP POST Bearer JWT None observed Exception silently caught
Annotations Auth HTTP POST Refresh token None observed Exception silently caught

Annotations Service Contract

Detections → Annotations is the primary outbound integration. During async media detection (POST /detect/{media_id}), each detection batch is posted to the Annotations service for persistence and downstream sync.

Endpoint: POST {ANNOTATIONS_URL}/annotations

Trigger: Each valid annotation batch during F3 (async media detection), only when the original client request included an Authorization header.

Payload sent by Detections: mediaId, source (AI=0), videoTime, list of Detection objects (centerX, centerY, width, height, classNum, label, confidence), and optional base64 image. userId is not included — resolved from the JWT by Annotations. The Annotations API contract also accepts description, affiliation, and combatReadiness on each Detection, but Detections does not populate these.

Responses: 201 Created, 400 Bad Request (missing image/mediaId), 404 Not Found (unknown mediaId).

Auth: Bearer JWT forwarded from the client. For long-running video, auto-refreshed via POST {ANNOTATIONS_URL}/auth/refresh (TokenManager, 60s pre-expiry window).

Downstream effect (Annotations side):

  1. Annotation persisted to local PostgreSQL (image hashed to XxHash64 ID)
  2. SSE event published to UI subscribers
  3. Annotation ID enqueued to annotations_queue_records → FailsafeProducer → RabbitMQ Stream (azaion-annotations) for central DB sync and AI training

Failure isolation: All POST failures are silently caught. Detection processing and SSE streaming continue regardless of Annotations service availability.

See _docs/02_document/modules/main.md § "Annotations Service Integration" for field-level schema detail.

6. Non-Functional Requirements

Requirement Target Measurement Priority
Concurrent inference 2 parallel jobs max ThreadPoolExecutor workers High
SSE queue depth 100 events per client asyncio.Queue maxsize Medium
Log retention 30 days loguru rotation config Medium
GPU compatibility Compute capability ≥ 6.1 pynvml check at startup High
Model format ONNX (portable) + TensorRT (GPU-specific) Engine filename includes CC+SM High

7. Security Architecture

Authentication: Pass-through Bearer JWT from client → forwarded to Annotations service. JWT exp decoded locally (base64, no signature verification) for token refresh timing.

Authorization: None at the detection service level. Auth is delegated to the Annotations service.

Data protection:

  • At rest: not applicable (no local persistence of detection results)
  • In transit: no TLS configured at application level (expected to be handled by infrastructure/reverse proxy)
  • Secrets management: tokens received per-request, no stored credentials

Audit logging: Inference activity logged to daily rotated files. No auth audit logging.

8. Key Architectural Decisions

ADR-001: Cython for Inference Pipeline

Context: Detection postprocessing involves tight loops over bounding box coordinates with floating-point math.

Decision: Implement the inference pipeline, data models, and engines as Cython cdef classes with typed variables.

Alternatives considered:

  1. Pure Python — rejected due to loop-heavy postprocessing performance
  2. C/C++ extension — rejected for development velocity; Cython offers C-speed with Python-like syntax

Consequences: Build step required (setup.py + Cython compilation). IDE support and debugging more complex.

ADR-002: Dual Engine Strategy (TensorRT + ONNX Fallback)

Context: Need maximum GPU inference speed where available, but must also run on CPU-only machines.

Decision: Check GPU at module load time. If compatible NVIDIA GPU found, use TensorRT; otherwise fall back to ONNX Runtime. Background-convert ONNX→TensorRT and cache the engine.

Alternatives considered:

  1. TensorRT only — rejected; would break CPU-only development/testing
  2. ONNX only — rejected; significantly slower on GPU vs TensorRT

Consequences: Two code paths to maintain. GPU-specific engine files cached per architecture.

ADR-003: Lazy Inference Initialization

Context: Engine initialization is slow (model download, possible conversion). API should start accepting health checks immediately.

Decision: Inference is created on first actual detection request, not at app startup. Health endpoint works without engine.

Consequences: First detection request has higher latency. AIAvailabilityStatus reports state transitions during initialization.

ADR-004: Large Image Tiling with GSD-Based Sizing

Context: Aerial images can be much larger than the model's fixed input size (1280×1280). Simple resize would lose small object detail.

Decision: Split large images into tiles sized by ground sampling distance (METERS_IN_TILE / GSD pixels) with configurable overlap. Deduplicate detections across tile boundaries.

Consequences: More complex pipeline. Tile deduplication relies on coordinate proximity threshold.