Files
detections/_docs/02_document/architecture.md
T
Oleksandr Bezdieniezhnykh 8ce40a9385 Add AIAvailabilityStatus and AIRecognitionConfig classes for AI model management
- Introduced `AIAvailabilityStatus` class to manage the availability status of AI models, including methods for setting status and logging messages.
- Added `AIRecognitionConfig` class to encapsulate configuration parameters for AI recognition, with a static method for creating instances from dictionaries.
- Implemented enums for AI availability states to enhance clarity and maintainability.
- Updated related Cython files to support the new classes and ensure proper type handling.

These changes aim to improve the structure and functionality of the AI model management system, facilitating better status tracking and configuration handling.
2026-03-31 05:49:51 +03:00

174 lines
9.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Azaion.Detections — Architecture
## 1. System Context
**Problem being solved**: Automated object detection on aerial imagery and video — identifying military and infrastructure objects (vehicles, artillery, trenches, personnel, etc.) from drone/satellite feeds and returning structured detection results with bounding boxes, class labels, and confidence scores.
**System boundaries**:
- **Inside**: FastAPI HTTP service, Cython-based inference pipeline, ONNX/TensorRT inference engines, image tiling, video frame processing, detection postprocessing
- **Outside**: Loader service (model storage), Annotations service (result persistence + auth), client applications
**External systems**:
| System | Integration Type | Direction | Purpose |
|--------|-----------------|-----------|---------|
| Loader Service | REST (HTTP) | Both | Download AI models, upload converted TensorRT engines |
| Annotations Service | REST (HTTP) | Outbound | Post detection results, refresh auth tokens |
| Client Applications | REST + SSE | Inbound | Submit detection requests, receive streaming results |
## 2. Technology Stack
| Layer | Technology | Version | Rationale |
|-------|-----------|---------|-----------|
| Language | Python 3 + Cython | 3.1.3 (Cython) | Python for API, Cython for performance-critical inference loops |
| Framework | FastAPI + Uvicorn | latest | Async HTTP + SSE support |
| ML Runtime (CPU) | ONNX Runtime | 1.22.0 | Portable model format, CPU/CUDA provider fallback |
| ML Runtime (GPU) | TensorRT + PyCUDA | 10.11.0 / 2025.1.1 | Maximum GPU inference performance |
| Image Processing | OpenCV | 4.10.0 | Frame decoding, preprocessing, tiling |
| HTTP Client | requests | 2.32.4 | Synchronous HTTP to Loader and Annotations services |
| Logging | loguru | 0.7.3 | Structured file + console logging |
| GPU Monitoring | pynvml | 12.0.0 | GPU detection, capability checks, memory queries |
| Numeric | NumPy | 2.3.0 | Tensor manipulation |
## 3. Deployment Model
**Infrastructure**: Containerized microservice, deployed alongside Loader and Annotations services (likely Docker Compose or Kubernetes given service discovery by hostname).
**Environment-specific configuration**:
| Config | Development | Production |
|--------|-------------|------------|
| LOADER_URL | `http://loader:8080` (default) | Environment variable |
| ANNOTATIONS_URL | `http://annotations:8080` (default) | Environment variable |
| GPU | Optional (falls back to ONNX CPU) | Required (TensorRT) |
| Logging | Console + file | File (`Logs/log_inference_YYYYMMDD.txt`, 30-day retention) |
## 4. Data Model Overview
**Core entities**:
| Entity | Description | Owned By Component |
|--------|-------------|--------------------|
| AnnotationClass | Detection class metadata (name, color, max physical size) | 01 Domain |
| Detection | Single bounding box with class + confidence | 01 Domain |
| Annotation | Collection of detections for one frame/tile + image | 01 Domain |
| AIRecognitionConfig | Runtime inference parameters | 01 Domain |
| AIAvailabilityStatus | Engine lifecycle state | 01 Domain |
| DetectionDto | API-facing detection response | 04 API |
| DetectionEvent | SSE event payload | 04 API |
**Key relationships**:
- Annotation → Detection: one-to-many (detections within a frame/tile)
- Detection → AnnotationClass: many-to-one (via class ID lookup in annotations_dict)
- Annotation → Media: many-to-one (multiple annotations per video/image)
**Data flow summary**:
- Media bytes → Preprocessing → Engine → Raw output → Postprocessing → Detection/Annotation → DTO → HTTP/SSE response
- ONNX model bytes → Loader → Engine init (or TensorRT conversion → upload back to Loader)
## 5. Integration Points
### Internal Communication
| From | To | Protocol | Pattern | Notes |
|------|----|----------|---------|-------|
| API | Inference Pipeline | Direct Python call | Sync (via ThreadPoolExecutor) | Lazy initialization |
| Inference Pipeline | Inference Engines | Direct Cython call | Sync | Strategy pattern selection |
| Inference Pipeline | Loader | HTTP POST | Request-Response | Model download/upload |
### External Integrations
| External System | Protocol | Auth | Rate Limits | Failure Mode |
|----------------|----------|------|-------------|--------------|
| Loader Service | HTTP POST | None | None observed | Exception → LoadResult(err) |
| Annotations Service | HTTP POST | Bearer JWT | None observed | Exception silently caught |
| Annotations Auth | HTTP POST | Refresh token | None observed | Exception silently caught |
#### Annotations Service Contract
Detections → Annotations is the primary outbound integration. During async media detection (`POST /detect/{media_id}`), each detection batch is posted to the Annotations service for persistence and downstream sync.
**Endpoint:** `POST {ANNOTATIONS_URL}/annotations`
**Trigger:** Each valid annotation batch during F3 (async media detection), only when the original client request included an Authorization header.
**Payload sent by Detections:** `mediaId`, `source` (AI=0), `videoTime`, list of Detection objects (`centerX`, `centerY`, `width`, `height`, `classNum`, `label`, `confidence`), and optional base64 `image`. `userId` is not included — resolved from the JWT by Annotations. The Annotations API contract also accepts `description`, `affiliation`, and `combatReadiness` on each Detection, but Detections does not populate these.
**Responses:** 201 Created, 400 Bad Request (missing image/mediaId), 404 Not Found (unknown mediaId).
**Auth:** Bearer JWT forwarded from the client. For long-running video, auto-refreshed via `POST {ANNOTATIONS_URL}/auth/refresh` (TokenManager, 60s pre-expiry window).
**Downstream effect (Annotations side):**
1. Annotation persisted to local PostgreSQL (image hashed to XxHash64 ID)
2. SSE event published to UI subscribers
3. Annotation ID enqueued to `annotations_queue_records` → FailsafeProducer → RabbitMQ Stream (`azaion-annotations`) for central DB sync and AI training
**Failure isolation:** All POST failures are silently caught. Detection processing and SSE streaming continue regardless of Annotations service availability.
See `_docs/02_document/modules/main.md` § "Annotations Service Integration" for field-level schema detail.
## 6. Non-Functional Requirements
| Requirement | Target | Measurement | Priority |
|------------|--------|-------------|----------|
| Concurrent inference | 2 parallel jobs max | ThreadPoolExecutor workers | High |
| SSE queue depth | 100 events per client | asyncio.Queue maxsize | Medium |
| Log retention | 30 days | loguru rotation config | Medium |
| GPU compatibility | Compute capability ≥ 6.1 | pynvml check at startup | High |
| Model format | ONNX (portable) + TensorRT (GPU-specific) | Engine filename includes CC+SM | High |
## 7. Security Architecture
**Authentication**: Pass-through Bearer JWT from client → forwarded to Annotations service. JWT exp decoded locally (base64, no signature verification) for token refresh timing.
**Authorization**: None at the detection service level. Auth is delegated to the Annotations service.
**Data protection**:
- At rest: not applicable (no local persistence of detection results)
- In transit: no TLS configured at application level (expected to be handled by infrastructure/reverse proxy)
- Secrets management: tokens received per-request, no stored credentials
**Audit logging**: Inference activity logged to daily rotated files. No auth audit logging.
## 8. Key Architectural Decisions
### ADR-001: Cython for Inference Pipeline
**Context**: Detection postprocessing involves tight loops over bounding box coordinates with floating-point math.
**Decision**: Implement the inference pipeline, data models, and engines as Cython `cdef` classes with typed variables.
**Alternatives considered**:
1. Pure Python — rejected due to loop-heavy postprocessing performance
2. C/C++ extension — rejected for development velocity; Cython offers C-speed with Python-like syntax
**Consequences**: Build step required (setup.py + Cython compilation). IDE support and debugging more complex.
### ADR-002: Dual Engine Strategy (TensorRT + ONNX Fallback)
**Context**: Need maximum GPU inference speed where available, but must also run on CPU-only machines.
**Decision**: Check GPU at module load time. If compatible NVIDIA GPU found, use TensorRT; otherwise fall back to ONNX Runtime. Background-convert ONNX→TensorRT and cache the engine.
**Alternatives considered**:
1. TensorRT only — rejected; would break CPU-only development/testing
2. ONNX only — rejected; significantly slower on GPU vs TensorRT
**Consequences**: Two code paths to maintain. GPU-specific engine files cached per architecture.
### ADR-003: Lazy Inference Initialization
**Context**: Engine initialization is slow (model download, possible conversion). API should start accepting health checks immediately.
**Decision**: `Inference` is created on first actual detection request, not at app startup. Health endpoint works without engine.
**Consequences**: First detection request has higher latency. `AIAvailabilityStatus` reports state transitions during initialization.
### ADR-004: Large Image Tiling with GSD-Based Sizing
**Context**: Aerial images can be much larger than the model's fixed input size (1280×1280). Simple resize would lose small object detail.
**Decision**: Split large images into tiles sized by ground sampling distance (`METERS_IN_TILE / GSD` pixels) with configurable overlap. Deduplicate detections across tile boundaries.
**Consequences**: More complex pipeline. Tile deduplication relies on coordinate proximity threshold.