mirror of https://github.com/azaion/detections.git synced 2026-04-22 22:16:31 +00:00

Files

T

Oleksandr Bezdieniezhnykh 86d8e7e22d [AZ-137] [AZ-138] Decompose test tasks and scaffold E2E test infrastructure

Made-with: Cursor

2026-03-23 14:07:54 +02:00

9.6 KiB

Raw Blame History

Azaion.Detections — Architecture

1. System Context

Problem being solved: Automated object detection on aerial imagery and video — identifying military and infrastructure objects (vehicles, artillery, trenches, personnel, etc.) from drone/satellite feeds and returning structured detection results with bounding boxes, class labels, and confidence scores.

System boundaries:

Inside: FastAPI HTTP service, Cython-based inference pipeline, ONNX/TensorRT inference engines, image tiling, video frame processing, detection postprocessing
Outside: Loader service (model storage), Annotations service (result persistence + auth), client applications

External systems:

System	Integration Type	Direction	Purpose
Loader Service	REST (HTTP)	Both	Download AI models, upload converted TensorRT engines
Annotations Service	REST (HTTP)	Outbound	Post detection results, refresh auth tokens
Client Applications	REST + SSE	Inbound	Submit detection requests, receive streaming results

2. Technology Stack

Layer	Technology	Version	Rationale
Language	Python 3 + Cython	3.1.3 (Cython)	Python for API, Cython for performance-critical inference loops
Framework	FastAPI + Uvicorn	latest	Async HTTP + SSE support
ML Runtime (CPU)	ONNX Runtime	1.22.0	Portable model format, CPU/CUDA provider fallback
ML Runtime (GPU)	TensorRT + PyCUDA	10.11.0 / 2025.1.1	Maximum GPU inference performance
Image Processing	OpenCV	4.10.0	Frame decoding, preprocessing, tiling
Serialization	msgpack	1.1.1	Compact binary serialization for annotations and configs
HTTP Client	requests	2.32.4	Synchronous HTTP to Loader and Annotations services
Logging	loguru	0.7.3	Structured file + console logging
GPU Monitoring	pynvml	12.0.0	GPU detection, capability checks, memory queries
Numeric	NumPy	2.3.0	Tensor manipulation

3. Deployment Model

Infrastructure: Containerized microservice, deployed alongside Loader and Annotations services (likely Docker Compose or Kubernetes given service discovery by hostname).

Environment-specific configuration:

Config	Development	Production
LOADER_URL	`http://loader:8080` (default)	Environment variable
ANNOTATIONS_URL	`http://annotations:8080` (default)	Environment variable
GPU	Optional (falls back to ONNX CPU)	Required (TensorRT)
Logging	Console + file	File (`Logs/log_inference_YYYYMMDD.txt`, 30-day retention)

4. Data Model Overview

Core entities:

Entity	Description	Owned By Component
AnnotationClass	Detection class metadata (name, color, max physical size)	01 Domain
Detection	Single bounding box with class + confidence	01 Domain
Annotation	Collection of detections for one frame/tile + image	01 Domain
AIRecognitionConfig	Runtime inference parameters	01 Domain
AIAvailabilityStatus	Engine lifecycle state	01 Domain
DetectionDto	API-facing detection response	04 API
DetectionEvent	SSE event payload	04 API

Key relationships:

Annotation → Detection: one-to-many (detections within a frame/tile)
Detection → AnnotationClass: many-to-one (via class ID lookup in annotations_dict)
Annotation → Media: many-to-one (multiple annotations per video/image)

Data flow summary:

Media bytes → Preprocessing → Engine → Raw output → Postprocessing → Detection/Annotation → DTO → HTTP/SSE response
ONNX model bytes → Loader → Engine init (or TensorRT conversion → upload back to Loader)

5. Integration Points

Internal Communication

From	To	Protocol	Pattern	Notes
API	Inference Pipeline	Direct Python call	Sync (via ThreadPoolExecutor)	Lazy initialization
Inference Pipeline	Inference Engines	Direct Cython call	Sync	Strategy pattern selection
Inference Pipeline	Loader	HTTP POST	Request-Response	Model download/upload

External Integrations

External System	Protocol	Auth	Rate Limits	Failure Mode
Loader Service	HTTP POST	None	None observed	Exception → LoadResult(err)
Annotations Service	HTTP POST	Bearer JWT	None observed	Exception silently caught
Annotations Auth	HTTP POST	Refresh token	None observed	Exception silently caught

Annotations Service Contract

Detections → Annotations is the primary outbound integration. During async media detection (POST /detect/{media_id}), each detection batch is posted to the Annotations service for persistence and downstream sync.

Endpoint: POST {ANNOTATIONS_URL}/annotations

Trigger: Each valid annotation batch during F3 (async media detection), only when the original client request included an Authorization header.

Payload sent by Detections: mediaId, source (AI=0), videoTime, list of Detection objects (centerX, centerY, width, height, classNum, label, confidence), and optional base64 image. userId is not included — resolved from the JWT by Annotations. The Annotations API contract also accepts description, affiliation, and combatReadiness on each Detection, but Detections does not populate these.

Responses: 201 Created, 400 Bad Request (missing image/mediaId), 404 Not Found (unknown mediaId).

Auth: Bearer JWT forwarded from the client. For long-running video, auto-refreshed via POST {ANNOTATIONS_URL}/auth/refresh (TokenManager, 60s pre-expiry window).

Downstream effect (Annotations side):

Annotation persisted to local PostgreSQL (image hashed to XxHash64 ID)
SSE event published to UI subscribers
Annotation ID enqueued to annotations_queue_records → FailsafeProducer → RabbitMQ Stream (azaion-annotations) for central DB sync and AI training

Failure isolation: All POST failures are silently caught. Detection processing and SSE streaming continue regardless of Annotations service availability.

See _docs/02_document/modules/main.md § "Annotations Service Integration" for field-level schema detail.

6. Non-Functional Requirements

Requirement	Target	Measurement	Priority
Concurrent inference	2 parallel jobs max	ThreadPoolExecutor workers	High
SSE queue depth	100 events per client	asyncio.Queue maxsize	Medium
Log retention	30 days	loguru rotation config	Medium
GPU compatibility	Compute capability ≥ 6.1	pynvml check at startup	High
Model format	ONNX (portable) + TensorRT (GPU-specific)	Engine filename includes CC+SM	High

7. Security Architecture

Authentication: Pass-through Bearer JWT from client → forwarded to Annotations service. JWT exp decoded locally (base64, no signature verification) for token refresh timing.

Authorization: None at the detection service level. Auth is delegated to the Annotations service.

Data protection:

At rest: not applicable (no local persistence of detection results)
In transit: no TLS configured at application level (expected to be handled by infrastructure/reverse proxy)
Secrets management: tokens received per-request, no stored credentials

Audit logging: Inference activity logged to daily rotated files. No auth audit logging.

8. Key Architectural Decisions

ADR-001: Cython for Inference Pipeline

Context: Detection postprocessing involves tight loops over bounding box coordinates with floating-point math.

Decision: Implement the inference pipeline, data models, and engines as Cython cdef classes with typed variables.

Alternatives considered:

Pure Python — rejected due to loop-heavy postprocessing performance
C/C++ extension — rejected for development velocity; Cython offers C-speed with Python-like syntax

Consequences: Build step required (setup.py + Cython compilation). IDE support and debugging more complex.

ADR-002: Dual Engine Strategy (TensorRT + ONNX Fallback)

Context: Need maximum GPU inference speed where available, but must also run on CPU-only machines.

Decision: Check GPU at module load time. If compatible NVIDIA GPU found, use TensorRT; otherwise fall back to ONNX Runtime. Background-convert ONNX→TensorRT and cache the engine.

Alternatives considered:

TensorRT only — rejected; would break CPU-only development/testing
ONNX only — rejected; significantly slower on GPU vs TensorRT

Consequences: Two code paths to maintain. GPU-specific engine files cached per architecture.

ADR-003: Lazy Inference Initialization

Context: Engine initialization is slow (model download, possible conversion). API should start accepting health checks immediately.

Decision: Inference is created on first actual detection request, not at app startup. Health endpoint works without engine.

Consequences: First detection request has higher latency. AIAvailabilityStatus reports state transitions during initialization.

ADR-004: Large Image Tiling with GSD-Based Sizing

Context: Aerial images can be much larger than the model's fixed input size (1280×1280). Simple resize would lose small object detail.

Decision: Split large images into tiles sized by ground sampling distance (METERS_IN_TILE / GSD pixels) with configurable overlap. Deduplicate detections across tile boundaries.

Consequences: More complex pipeline. Tile deduplication relies on coordinate proximity threshold.

9.6 KiB Raw Blame History Unescape Escape