mirror of
https://github.com/azaion/detections.git
synced 2026-04-22 21:56:33 +00:00
86d8e7e22d
Made-with: Cursor
175 lines
9.6 KiB
Markdown
175 lines
9.6 KiB
Markdown
# Azaion.Detections — Architecture
|
||
|
||
## 1. System Context
|
||
|
||
**Problem being solved**: Automated object detection on aerial imagery and video — identifying military and infrastructure objects (vehicles, artillery, trenches, personnel, etc.) from drone/satellite feeds and returning structured detection results with bounding boxes, class labels, and confidence scores.
|
||
|
||
**System boundaries**:
|
||
- **Inside**: FastAPI HTTP service, Cython-based inference pipeline, ONNX/TensorRT inference engines, image tiling, video frame processing, detection postprocessing
|
||
- **Outside**: Loader service (model storage), Annotations service (result persistence + auth), client applications
|
||
|
||
**External systems**:
|
||
|
||
| System | Integration Type | Direction | Purpose |
|
||
|--------|-----------------|-----------|---------|
|
||
| Loader Service | REST (HTTP) | Both | Download AI models, upload converted TensorRT engines |
|
||
| Annotations Service | REST (HTTP) | Outbound | Post detection results, refresh auth tokens |
|
||
| Client Applications | REST + SSE | Inbound | Submit detection requests, receive streaming results |
|
||
|
||
## 2. Technology Stack
|
||
|
||
| Layer | Technology | Version | Rationale |
|
||
|-------|-----------|---------|-----------|
|
||
| Language | Python 3 + Cython | 3.1.3 (Cython) | Python for API, Cython for performance-critical inference loops |
|
||
| Framework | FastAPI + Uvicorn | latest | Async HTTP + SSE support |
|
||
| ML Runtime (CPU) | ONNX Runtime | 1.22.0 | Portable model format, CPU/CUDA provider fallback |
|
||
| ML Runtime (GPU) | TensorRT + PyCUDA | 10.11.0 / 2025.1.1 | Maximum GPU inference performance |
|
||
| Image Processing | OpenCV | 4.10.0 | Frame decoding, preprocessing, tiling |
|
||
| Serialization | msgpack | 1.1.1 | Compact binary serialization for annotations and configs |
|
||
| HTTP Client | requests | 2.32.4 | Synchronous HTTP to Loader and Annotations services |
|
||
| Logging | loguru | 0.7.3 | Structured file + console logging |
|
||
| GPU Monitoring | pynvml | 12.0.0 | GPU detection, capability checks, memory queries |
|
||
| Numeric | NumPy | 2.3.0 | Tensor manipulation |
|
||
|
||
## 3. Deployment Model
|
||
|
||
**Infrastructure**: Containerized microservice, deployed alongside Loader and Annotations services (likely Docker Compose or Kubernetes given service discovery by hostname).
|
||
|
||
**Environment-specific configuration**:
|
||
|
||
| Config | Development | Production |
|
||
|--------|-------------|------------|
|
||
| LOADER_URL | `http://loader:8080` (default) | Environment variable |
|
||
| ANNOTATIONS_URL | `http://annotations:8080` (default) | Environment variable |
|
||
| GPU | Optional (falls back to ONNX CPU) | Required (TensorRT) |
|
||
| Logging | Console + file | File (`Logs/log_inference_YYYYMMDD.txt`, 30-day retention) |
|
||
|
||
## 4. Data Model Overview
|
||
|
||
**Core entities**:
|
||
|
||
| Entity | Description | Owned By Component |
|
||
|--------|-------------|--------------------|
|
||
| AnnotationClass | Detection class metadata (name, color, max physical size) | 01 Domain |
|
||
| Detection | Single bounding box with class + confidence | 01 Domain |
|
||
| Annotation | Collection of detections for one frame/tile + image | 01 Domain |
|
||
| AIRecognitionConfig | Runtime inference parameters | 01 Domain |
|
||
| AIAvailabilityStatus | Engine lifecycle state | 01 Domain |
|
||
| DetectionDto | API-facing detection response | 04 API |
|
||
| DetectionEvent | SSE event payload | 04 API |
|
||
|
||
**Key relationships**:
|
||
- Annotation → Detection: one-to-many (detections within a frame/tile)
|
||
- Detection → AnnotationClass: many-to-one (via class ID lookup in annotations_dict)
|
||
- Annotation → Media: many-to-one (multiple annotations per video/image)
|
||
|
||
**Data flow summary**:
|
||
- Media bytes → Preprocessing → Engine → Raw output → Postprocessing → Detection/Annotation → DTO → HTTP/SSE response
|
||
- ONNX model bytes → Loader → Engine init (or TensorRT conversion → upload back to Loader)
|
||
|
||
## 5. Integration Points
|
||
|
||
### Internal Communication
|
||
|
||
| From | To | Protocol | Pattern | Notes |
|
||
|------|----|----------|---------|-------|
|
||
| API | Inference Pipeline | Direct Python call | Sync (via ThreadPoolExecutor) | Lazy initialization |
|
||
| Inference Pipeline | Inference Engines | Direct Cython call | Sync | Strategy pattern selection |
|
||
| Inference Pipeline | Loader | HTTP POST | Request-Response | Model download/upload |
|
||
|
||
### External Integrations
|
||
|
||
| External System | Protocol | Auth | Rate Limits | Failure Mode |
|
||
|----------------|----------|------|-------------|--------------|
|
||
| Loader Service | HTTP POST | None | None observed | Exception → LoadResult(err) |
|
||
| Annotations Service | HTTP POST | Bearer JWT | None observed | Exception silently caught |
|
||
| Annotations Auth | HTTP POST | Refresh token | None observed | Exception silently caught |
|
||
|
||
#### Annotations Service Contract
|
||
|
||
Detections → Annotations is the primary outbound integration. During async media detection (`POST /detect/{media_id}`), each detection batch is posted to the Annotations service for persistence and downstream sync.
|
||
|
||
**Endpoint:** `POST {ANNOTATIONS_URL}/annotations`
|
||
|
||
**Trigger:** Each valid annotation batch during F3 (async media detection), only when the original client request included an Authorization header.
|
||
|
||
**Payload sent by Detections:** `mediaId`, `source` (AI=0), `videoTime`, list of Detection objects (`centerX`, `centerY`, `width`, `height`, `classNum`, `label`, `confidence`), and optional base64 `image`. `userId` is not included — resolved from the JWT by Annotations. The Annotations API contract also accepts `description`, `affiliation`, and `combatReadiness` on each Detection, but Detections does not populate these.
|
||
|
||
**Responses:** 201 Created, 400 Bad Request (missing image/mediaId), 404 Not Found (unknown mediaId).
|
||
|
||
**Auth:** Bearer JWT forwarded from the client. For long-running video, auto-refreshed via `POST {ANNOTATIONS_URL}/auth/refresh` (TokenManager, 60s pre-expiry window).
|
||
|
||
**Downstream effect (Annotations side):**
|
||
1. Annotation persisted to local PostgreSQL (image hashed to XxHash64 ID)
|
||
2. SSE event published to UI subscribers
|
||
3. Annotation ID enqueued to `annotations_queue_records` → FailsafeProducer → RabbitMQ Stream (`azaion-annotations`) for central DB sync and AI training
|
||
|
||
**Failure isolation:** All POST failures are silently caught. Detection processing and SSE streaming continue regardless of Annotations service availability.
|
||
|
||
See `_docs/02_document/modules/main.md` § "Annotations Service Integration" for field-level schema detail.
|
||
|
||
## 6. Non-Functional Requirements
|
||
|
||
| Requirement | Target | Measurement | Priority |
|
||
|------------|--------|-------------|----------|
|
||
| Concurrent inference | 2 parallel jobs max | ThreadPoolExecutor workers | High |
|
||
| SSE queue depth | 100 events per client | asyncio.Queue maxsize | Medium |
|
||
| Log retention | 30 days | loguru rotation config | Medium |
|
||
| GPU compatibility | Compute capability ≥ 6.1 | pynvml check at startup | High |
|
||
| Model format | ONNX (portable) + TensorRT (GPU-specific) | Engine filename includes CC+SM | High |
|
||
|
||
## 7. Security Architecture
|
||
|
||
**Authentication**: Pass-through Bearer JWT from client → forwarded to Annotations service. JWT exp decoded locally (base64, no signature verification) for token refresh timing.
|
||
|
||
**Authorization**: None at the detection service level. Auth is delegated to the Annotations service.
|
||
|
||
**Data protection**:
|
||
- At rest: not applicable (no local persistence of detection results)
|
||
- In transit: no TLS configured at application level (expected to be handled by infrastructure/reverse proxy)
|
||
- Secrets management: tokens received per-request, no stored credentials
|
||
|
||
**Audit logging**: Inference activity logged to daily rotated files. No auth audit logging.
|
||
|
||
## 8. Key Architectural Decisions
|
||
|
||
### ADR-001: Cython for Inference Pipeline
|
||
|
||
**Context**: Detection postprocessing involves tight loops over bounding box coordinates with floating-point math.
|
||
|
||
**Decision**: Implement the inference pipeline, data models, and engines as Cython `cdef` classes with typed variables.
|
||
|
||
**Alternatives considered**:
|
||
1. Pure Python — rejected due to loop-heavy postprocessing performance
|
||
2. C/C++ extension — rejected for development velocity; Cython offers C-speed with Python-like syntax
|
||
|
||
**Consequences**: Build step required (setup.py + Cython compilation). IDE support and debugging more complex.
|
||
|
||
### ADR-002: Dual Engine Strategy (TensorRT + ONNX Fallback)
|
||
|
||
**Context**: Need maximum GPU inference speed where available, but must also run on CPU-only machines.
|
||
|
||
**Decision**: Check GPU at module load time. If compatible NVIDIA GPU found, use TensorRT; otherwise fall back to ONNX Runtime. Background-convert ONNX→TensorRT and cache the engine.
|
||
|
||
**Alternatives considered**:
|
||
1. TensorRT only — rejected; would break CPU-only development/testing
|
||
2. ONNX only — rejected; significantly slower on GPU vs TensorRT
|
||
|
||
**Consequences**: Two code paths to maintain. GPU-specific engine files cached per architecture.
|
||
|
||
### ADR-003: Lazy Inference Initialization
|
||
|
||
**Context**: Engine initialization is slow (model download, possible conversion). API should start accepting health checks immediately.
|
||
|
||
**Decision**: `Inference` is created on first actual detection request, not at app startup. Health endpoint works without engine.
|
||
|
||
**Consequences**: First detection request has higher latency. `AIAvailabilityStatus` reports state transitions during initialization.
|
||
|
||
### ADR-004: Large Image Tiling with GSD-Based Sizing
|
||
|
||
**Context**: Aerial images can be much larger than the model's fixed input size (1280×1280). Simple resize would lose small object detail.
|
||
|
||
**Decision**: Split large images into tiles sized by ground sampling distance (`METERS_IN_TILE / GSD` pixels) with configurable overlap. Deduplicate detections across tile boundaries.
|
||
|
||
**Consequences**: More complex pipeline. Tile deduplication relies on coordinate proximity threshold.
|