# Azaion.Detections — Architecture ## 1. System Context **Problem being solved**: Automated object detection on aerial imagery and video — identifying military and infrastructure objects (vehicles, artillery, trenches, personnel, etc.) from drone/satellite feeds and returning structured detection results with bounding boxes, class labels, and confidence scores. **System boundaries**: - **Inside**: FastAPI HTTP service, Cython-based inference pipeline, ONNX/TensorRT inference engines, image tiling, video frame processing, detection postprocessing - **Outside**: Loader service (model storage), Annotations service (result persistence + auth), client applications **External systems**: | System | Integration Type | Direction | Purpose | |--------|-----------------|-----------|---------| | Loader Service | REST (HTTP) | Both | Download AI models, upload converted TensorRT engines | | Annotations Service | REST (HTTP) | Outbound | Post detection results, refresh auth tokens | | Client Applications | REST + SSE | Inbound | Submit detection requests, receive streaming results | ## 2. Technology Stack | Layer | Technology | Version | Rationale | |-------|-----------|---------|-----------| | Language | Python 3 + Cython | 3.1.3 (Cython) | Python for API, Cython for performance-critical inference loops | | Framework | FastAPI + Uvicorn | latest | Async HTTP + SSE support | | ML Runtime (CPU) | ONNX Runtime | 1.22.0 | Portable model format, CPU/CUDA provider fallback | | ML Runtime (GPU) | TensorRT + PyCUDA | 10.11.0 / 2025.1.1 | Maximum GPU inference performance | | Image Processing | OpenCV | 4.10.0 | Frame decoding, preprocessing, tiling | | Serialization | msgpack | 1.1.1 | Compact binary serialization for annotations and configs | | HTTP Client | requests | 2.32.4 | Synchronous HTTP to Loader and Annotations services | | Logging | loguru | 0.7.3 | Structured file + console logging | | GPU Monitoring | pynvml | 12.0.0 | GPU detection, capability checks, memory queries | | Numeric | NumPy | 2.3.0 | Tensor manipulation | ## 3. Deployment Model **Infrastructure**: Containerized microservice, deployed alongside Loader and Annotations services (likely Docker Compose or Kubernetes given service discovery by hostname). **Environment-specific configuration**: | Config | Development | Production | |--------|-------------|------------| | LOADER_URL | `http://loader:8080` (default) | Environment variable | | ANNOTATIONS_URL | `http://annotations:8080` (default) | Environment variable | | GPU | Optional (falls back to ONNX CPU) | Required (TensorRT) | | Logging | Console + file | File (`Logs/log_inference_YYYYMMDD.txt`, 30-day retention) | ## 4. Data Model Overview **Core entities**: | Entity | Description | Owned By Component | |--------|-------------|--------------------| | AnnotationClass | Detection class metadata (name, color, max physical size) | 01 Domain | | Detection | Single bounding box with class + confidence | 01 Domain | | Annotation | Collection of detections for one frame/tile + image | 01 Domain | | AIRecognitionConfig | Runtime inference parameters | 01 Domain | | AIAvailabilityStatus | Engine lifecycle state | 01 Domain | | DetectionDto | API-facing detection response | 04 API | | DetectionEvent | SSE event payload | 04 API | **Key relationships**: - Annotation → Detection: one-to-many (detections within a frame/tile) - Detection → AnnotationClass: many-to-one (via class ID lookup in annotations_dict) - Annotation → Media: many-to-one (multiple annotations per video/image) **Data flow summary**: - Media bytes → Preprocessing → Engine → Raw output → Postprocessing → Detection/Annotation → DTO → HTTP/SSE response - ONNX model bytes → Loader → Engine init (or TensorRT conversion → upload back to Loader) ## 5. Integration Points ### Internal Communication | From | To | Protocol | Pattern | Notes | |------|----|----------|---------|-------| | API | Inference Pipeline | Direct Python call | Sync (via ThreadPoolExecutor) | Lazy initialization | | Inference Pipeline | Inference Engines | Direct Cython call | Sync | Strategy pattern selection | | Inference Pipeline | Loader | HTTP POST | Request-Response | Model download/upload | ### External Integrations | External System | Protocol | Auth | Rate Limits | Failure Mode | |----------------|----------|------|-------------|--------------| | Loader Service | HTTP POST | None | None observed | Exception → LoadResult(err) | | Annotations Service | HTTP POST | Bearer JWT | None observed | Exception silently caught | | Annotations Auth | HTTP POST | Refresh token | None observed | Exception silently caught | #### Annotations Service Contract Detections → Annotations is the primary outbound integration. During async media detection (`POST /detect/{media_id}`), each detection batch is posted to the Annotations service for persistence and downstream sync. **Endpoint:** `POST {ANNOTATIONS_URL}/annotations` **Trigger:** Each valid annotation batch during F3 (async media detection), only when the original client request included an Authorization header. **Payload sent by Detections:** `mediaId`, `source` (AI=0), `videoTime`, list of Detection objects (`centerX`, `centerY`, `width`, `height`, `classNum`, `label`, `confidence`), and optional base64 `image`. `userId` is not included — resolved from the JWT by Annotations. The Annotations API contract also accepts `description`, `affiliation`, and `combatReadiness` on each Detection, but Detections does not populate these. **Responses:** 201 Created, 400 Bad Request (missing image/mediaId), 404 Not Found (unknown mediaId). **Auth:** Bearer JWT forwarded from the client. For long-running video, auto-refreshed via `POST {ANNOTATIONS_URL}/auth/refresh` (TokenManager, 60s pre-expiry window). **Downstream effect (Annotations side):** 1. Annotation persisted to local PostgreSQL (image hashed to XxHash64 ID) 2. SSE event published to UI subscribers 3. Annotation ID enqueued to `annotations_queue_records` → FailsafeProducer → RabbitMQ Stream (`azaion-annotations`) for central DB sync and AI training **Failure isolation:** All POST failures are silently caught. Detection processing and SSE streaming continue regardless of Annotations service availability. See `_docs/02_document/modules/main.md` § "Annotations Service Integration" for field-level schema detail. ## 6. Non-Functional Requirements | Requirement | Target | Measurement | Priority | |------------|--------|-------------|----------| | Concurrent inference | 2 parallel jobs max | ThreadPoolExecutor workers | High | | SSE queue depth | 100 events per client | asyncio.Queue maxsize | Medium | | Log retention | 30 days | loguru rotation config | Medium | | GPU compatibility | Compute capability ≥ 6.1 | pynvml check at startup | High | | Model format | ONNX (portable) + TensorRT (GPU-specific) | Engine filename includes CC+SM | High | ## 7. Security Architecture **Authentication**: Pass-through Bearer JWT from client → forwarded to Annotations service. JWT exp decoded locally (base64, no signature verification) for token refresh timing. **Authorization**: None at the detection service level. Auth is delegated to the Annotations service. **Data protection**: - At rest: not applicable (no local persistence of detection results) - In transit: no TLS configured at application level (expected to be handled by infrastructure/reverse proxy) - Secrets management: tokens received per-request, no stored credentials **Audit logging**: Inference activity logged to daily rotated files. No auth audit logging. ## 8. Key Architectural Decisions ### ADR-001: Cython for Inference Pipeline **Context**: Detection postprocessing involves tight loops over bounding box coordinates with floating-point math. **Decision**: Implement the inference pipeline, data models, and engines as Cython `cdef` classes with typed variables. **Alternatives considered**: 1. Pure Python — rejected due to loop-heavy postprocessing performance 2. C/C++ extension — rejected for development velocity; Cython offers C-speed with Python-like syntax **Consequences**: Build step required (setup.py + Cython compilation). IDE support and debugging more complex. ### ADR-002: Dual Engine Strategy (TensorRT + ONNX Fallback) **Context**: Need maximum GPU inference speed where available, but must also run on CPU-only machines. **Decision**: Check GPU at module load time. If compatible NVIDIA GPU found, use TensorRT; otherwise fall back to ONNX Runtime. Background-convert ONNX→TensorRT and cache the engine. **Alternatives considered**: 1. TensorRT only — rejected; would break CPU-only development/testing 2. ONNX only — rejected; significantly slower on GPU vs TensorRT **Consequences**: Two code paths to maintain. GPU-specific engine files cached per architecture. ### ADR-003: Lazy Inference Initialization **Context**: Engine initialization is slow (model download, possible conversion). API should start accepting health checks immediately. **Decision**: `Inference` is created on first actual detection request, not at app startup. Health endpoint works without engine. **Consequences**: First detection request has higher latency. `AIAvailabilityStatus` reports state transitions during initialization. ### ADR-004: Large Image Tiling with GSD-Based Sizing **Context**: Aerial images can be much larger than the model's fixed input size (1280×1280). Simple resize would lose small object detail. **Decision**: Split large images into tiles sized by ground sampling distance (`METERS_IN_TILE / GSD` pixels) with configurable overlap. Deduplicate detections across tile boundaries. **Consequences**: More complex pipeline. Tile deduplication relies on coordinate proximity threshold.