Made-with: Cursor
9.6 KiB
Azaion.Detections — Architecture
1. System Context
Problem being solved: Automated object detection on aerial imagery and video — identifying military and infrastructure objects (vehicles, artillery, trenches, personnel, etc.) from drone/satellite feeds and returning structured detection results with bounding boxes, class labels, and confidence scores.
System boundaries:
- Inside: FastAPI HTTP service, Cython-based inference pipeline, ONNX/TensorRT inference engines, image tiling, video frame processing, detection postprocessing
- Outside: Loader service (model storage), Annotations service (result persistence + auth), client applications
External systems:
| System | Integration Type | Direction | Purpose |
|---|---|---|---|
| Loader Service | REST (HTTP) | Both | Download AI models, upload converted TensorRT engines |
| Annotations Service | REST (HTTP) | Outbound | Post detection results, refresh auth tokens |
| Client Applications | REST + SSE | Inbound | Submit detection requests, receive streaming results |
2. Technology Stack
| Layer | Technology | Version | Rationale |
|---|---|---|---|
| Language | Python 3 + Cython | 3.1.3 (Cython) | Python for API, Cython for performance-critical inference loops |
| Framework | FastAPI + Uvicorn | latest | Async HTTP + SSE support |
| ML Runtime (CPU) | ONNX Runtime | 1.22.0 | Portable model format, CPU/CUDA provider fallback |
| ML Runtime (GPU) | TensorRT + PyCUDA | 10.11.0 / 2025.1.1 | Maximum GPU inference performance |
| Image Processing | OpenCV | 4.10.0 | Frame decoding, preprocessing, tiling |
| Serialization | msgpack | 1.1.1 | Compact binary serialization for annotations and configs |
| HTTP Client | requests | 2.32.4 | Synchronous HTTP to Loader and Annotations services |
| Logging | loguru | 0.7.3 | Structured file + console logging |
| GPU Monitoring | pynvml | 12.0.0 | GPU detection, capability checks, memory queries |
| Numeric | NumPy | 2.3.0 | Tensor manipulation |
3. Deployment Model
Infrastructure: Containerized microservice, deployed alongside Loader and Annotations services (likely Docker Compose or Kubernetes given service discovery by hostname).
Environment-specific configuration:
| Config | Development | Production |
|---|---|---|
| LOADER_URL | http://loader:8080 (default) |
Environment variable |
| ANNOTATIONS_URL | http://annotations:8080 (default) |
Environment variable |
| GPU | Optional (falls back to ONNX CPU) | Required (TensorRT) |
| Logging | Console + file | File (Logs/log_inference_YYYYMMDD.txt, 30-day retention) |
4. Data Model Overview
Core entities:
| Entity | Description | Owned By Component |
|---|---|---|
| AnnotationClass | Detection class metadata (name, color, max physical size) | 01 Domain |
| Detection | Single bounding box with class + confidence | 01 Domain |
| Annotation | Collection of detections for one frame/tile + image | 01 Domain |
| AIRecognitionConfig | Runtime inference parameters | 01 Domain |
| AIAvailabilityStatus | Engine lifecycle state | 01 Domain |
| DetectionDto | API-facing detection response | 04 API |
| DetectionEvent | SSE event payload | 04 API |
Key relationships:
- Annotation → Detection: one-to-many (detections within a frame/tile)
- Detection → AnnotationClass: many-to-one (via class ID lookup in annotations_dict)
- Annotation → Media: many-to-one (multiple annotations per video/image)
Data flow summary:
- Media bytes → Preprocessing → Engine → Raw output → Postprocessing → Detection/Annotation → DTO → HTTP/SSE response
- ONNX model bytes → Loader → Engine init (or TensorRT conversion → upload back to Loader)
5. Integration Points
Internal Communication
| From | To | Protocol | Pattern | Notes |
|---|---|---|---|---|
| API | Inference Pipeline | Direct Python call | Sync (via ThreadPoolExecutor) | Lazy initialization |
| Inference Pipeline | Inference Engines | Direct Cython call | Sync | Strategy pattern selection |
| Inference Pipeline | Loader | HTTP POST | Request-Response | Model download/upload |
External Integrations
| External System | Protocol | Auth | Rate Limits | Failure Mode |
|---|---|---|---|---|
| Loader Service | HTTP POST | None | None observed | Exception → LoadResult(err) |
| Annotations Service | HTTP POST | Bearer JWT | None observed | Exception silently caught |
| Annotations Auth | HTTP POST | Refresh token | None observed | Exception silently caught |
Annotations Service Contract
Detections → Annotations is the primary outbound integration. During async media detection (POST /detect/{media_id}), each detection batch is posted to the Annotations service for persistence and downstream sync.
Endpoint: POST {ANNOTATIONS_URL}/annotations
Trigger: Each valid annotation batch during F3 (async media detection), only when the original client request included an Authorization header.
Payload sent by Detections: mediaId, source (AI=0), videoTime, list of Detection objects (centerX, centerY, width, height, classNum, label, confidence), and optional base64 image. userId is not included — resolved from the JWT by Annotations. The Annotations API contract also accepts description, affiliation, and combatReadiness on each Detection, but Detections does not populate these.
Responses: 201 Created, 400 Bad Request (missing image/mediaId), 404 Not Found (unknown mediaId).
Auth: Bearer JWT forwarded from the client. For long-running video, auto-refreshed via POST {ANNOTATIONS_URL}/auth/refresh (TokenManager, 60s pre-expiry window).
Downstream effect (Annotations side):
- Annotation persisted to local PostgreSQL (image hashed to XxHash64 ID)
- SSE event published to UI subscribers
- Annotation ID enqueued to
annotations_queue_records→ FailsafeProducer → RabbitMQ Stream (azaion-annotations) for central DB sync and AI training
Failure isolation: All POST failures are silently caught. Detection processing and SSE streaming continue regardless of Annotations service availability.
See _docs/02_document/modules/main.md § "Annotations Service Integration" for field-level schema detail.
6. Non-Functional Requirements
| Requirement | Target | Measurement | Priority |
|---|---|---|---|
| Concurrent inference | 2 parallel jobs max | ThreadPoolExecutor workers | High |
| SSE queue depth | 100 events per client | asyncio.Queue maxsize | Medium |
| Log retention | 30 days | loguru rotation config | Medium |
| GPU compatibility | Compute capability ≥ 6.1 | pynvml check at startup | High |
| Model format | ONNX (portable) + TensorRT (GPU-specific) | Engine filename includes CC+SM | High |
7. Security Architecture
Authentication: Pass-through Bearer JWT from client → forwarded to Annotations service. JWT exp decoded locally (base64, no signature verification) for token refresh timing.
Authorization: None at the detection service level. Auth is delegated to the Annotations service.
Data protection:
- At rest: not applicable (no local persistence of detection results)
- In transit: no TLS configured at application level (expected to be handled by infrastructure/reverse proxy)
- Secrets management: tokens received per-request, no stored credentials
Audit logging: Inference activity logged to daily rotated files. No auth audit logging.
8. Key Architectural Decisions
ADR-001: Cython for Inference Pipeline
Context: Detection postprocessing involves tight loops over bounding box coordinates with floating-point math.
Decision: Implement the inference pipeline, data models, and engines as Cython cdef classes with typed variables.
Alternatives considered:
- Pure Python — rejected due to loop-heavy postprocessing performance
- C/C++ extension — rejected for development velocity; Cython offers C-speed with Python-like syntax
Consequences: Build step required (setup.py + Cython compilation). IDE support and debugging more complex.
ADR-002: Dual Engine Strategy (TensorRT + ONNX Fallback)
Context: Need maximum GPU inference speed where available, but must also run on CPU-only machines.
Decision: Check GPU at module load time. If compatible NVIDIA GPU found, use TensorRT; otherwise fall back to ONNX Runtime. Background-convert ONNX→TensorRT and cache the engine.
Alternatives considered:
- TensorRT only — rejected; would break CPU-only development/testing
- ONNX only — rejected; significantly slower on GPU vs TensorRT
Consequences: Two code paths to maintain. GPU-specific engine files cached per architecture.
ADR-003: Lazy Inference Initialization
Context: Engine initialization is slow (model download, possible conversion). API should start accepting health checks immediately.
Decision: Inference is created on first actual detection request, not at app startup. Health endpoint works without engine.
Consequences: First detection request has higher latency. AIAvailabilityStatus reports state transitions during initialization.
ADR-004: Large Image Tiling with GSD-Based Sizing
Context: Aerial images can be much larger than the model's fixed input size (1280×1280). Simple resize would lose small object detail.
Decision: Split large images into tiles sized by ground sampling distance (METERS_IN_TILE / GSD pixels) with configurable overlap. Deduplicate detections across tile boundaries.
Consequences: More complex pipeline. Tile deduplication relies on coordinate proximity threshold.