Add detailed file index and enhance skill documentation for autopilot, decompose, deploy, plan, and research skills. Introduce tests-only mode in decompose skill, clarify required files for deploy and plan skills, and improve prerequisite checks across skills for better user guidance and workflow efficiency.

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-22 16:15:49 +02:00
parent 60ebe686ff
commit 3165a88f0b
60 changed files with 6324 additions and 1550 deletions
+151
View File
@@ -0,0 +1,151 @@
# Azaion.Detections — Architecture
## 1. System Context
**Problem being solved**: Automated object detection on aerial imagery and video — identifying military and infrastructure objects (vehicles, artillery, trenches, personnel, etc.) from drone/satellite feeds and returning structured detection results with bounding boxes, class labels, and confidence scores.
**System boundaries**:
- **Inside**: FastAPI HTTP service, Cython-based inference pipeline, ONNX/TensorRT inference engines, image tiling, video frame processing, detection postprocessing
- **Outside**: Loader service (model storage), Annotations service (result persistence + auth), client applications
**External systems**:
| System | Integration Type | Direction | Purpose |
|--------|-----------------|-----------|---------|
| Loader Service | REST (HTTP) | Both | Download AI models, upload converted TensorRT engines |
| Annotations Service | REST (HTTP) | Outbound | Post detection results, refresh auth tokens |
| Client Applications | REST + SSE | Inbound | Submit detection requests, receive streaming results |
## 2. Technology Stack
| Layer | Technology | Version | Rationale |
|-------|-----------|---------|-----------|
| Language | Python 3 + Cython | 3.1.3 (Cython) | Python for API, Cython for performance-critical inference loops |
| Framework | FastAPI + Uvicorn | latest | Async HTTP + SSE support |
| ML Runtime (CPU) | ONNX Runtime | 1.22.0 | Portable model format, CPU/CUDA provider fallback |
| ML Runtime (GPU) | TensorRT + PyCUDA | 10.11.0 / 2025.1.1 | Maximum GPU inference performance |
| Image Processing | OpenCV | 4.10.0 | Frame decoding, preprocessing, tiling |
| Serialization | msgpack | 1.1.1 | Compact binary serialization for annotations and configs |
| HTTP Client | requests | 2.32.4 | Synchronous HTTP to Loader and Annotations services |
| Logging | loguru | 0.7.3 | Structured file + console logging |
| GPU Monitoring | pynvml | 12.0.0 | GPU detection, capability checks, memory queries |
| Numeric | NumPy | 2.3.0 | Tensor manipulation |
## 3. Deployment Model
**Infrastructure**: Containerized microservice, deployed alongside Loader and Annotations services (likely Docker Compose or Kubernetes given service discovery by hostname).
**Environment-specific configuration**:
| Config | Development | Production |
|--------|-------------|------------|
| LOADER_URL | `http://loader:8080` (default) | Environment variable |
| ANNOTATIONS_URL | `http://annotations:8080` (default) | Environment variable |
| GPU | Optional (falls back to ONNX CPU) | Required (TensorRT) |
| Logging | Console + file | File (`Logs/log_inference_YYYYMMDD.txt`, 30-day retention) |
## 4. Data Model Overview
**Core entities**:
| Entity | Description | Owned By Component |
|--------|-------------|--------------------|
| AnnotationClass | Detection class metadata (name, color, max physical size) | 01 Domain |
| Detection | Single bounding box with class + confidence | 01 Domain |
| Annotation | Collection of detections for one frame/tile + image | 01 Domain |
| AIRecognitionConfig | Runtime inference parameters | 01 Domain |
| AIAvailabilityStatus | Engine lifecycle state | 01 Domain |
| DetectionDto | API-facing detection response | 04 API |
| DetectionEvent | SSE event payload | 04 API |
**Key relationships**:
- Annotation → Detection: one-to-many (detections within a frame/tile)
- Detection → AnnotationClass: many-to-one (via class ID lookup in annotations_dict)
- Annotation → Media: many-to-one (multiple annotations per video/image)
**Data flow summary**:
- Media bytes → Preprocessing → Engine → Raw output → Postprocessing → Detection/Annotation → DTO → HTTP/SSE response
- ONNX model bytes → Loader → Engine init (or TensorRT conversion → upload back to Loader)
## 5. Integration Points
### Internal Communication
| From | To | Protocol | Pattern | Notes |
|------|----|----------|---------|-------|
| API | Inference Pipeline | Direct Python call | Sync (via ThreadPoolExecutor) | Lazy initialization |
| Inference Pipeline | Inference Engines | Direct Cython call | Sync | Strategy pattern selection |
| Inference Pipeline | Loader | HTTP POST | Request-Response | Model download/upload |
### External Integrations
| External System | Protocol | Auth | Rate Limits | Failure Mode |
|----------------|----------|------|-------------|--------------|
| Loader Service | HTTP POST | None | None observed | Exception → LoadResult(err) |
| Annotations Service | HTTP POST | Bearer JWT | None observed | Exception silently caught |
| Annotations Auth | HTTP POST | Refresh token | None observed | Exception silently caught |
## 6. Non-Functional Requirements
| Requirement | Target | Measurement | Priority |
|------------|--------|-------------|----------|
| Concurrent inference | 2 parallel jobs max | ThreadPoolExecutor workers | High |
| SSE queue depth | 100 events per client | asyncio.Queue maxsize | Medium |
| Log retention | 30 days | loguru rotation config | Medium |
| GPU compatibility | Compute capability ≥ 6.1 | pynvml check at startup | High |
| Model format | ONNX (portable) + TensorRT (GPU-specific) | Engine filename includes CC+SM | High |
## 7. Security Architecture
**Authentication**: Pass-through Bearer JWT from client → forwarded to Annotations service. JWT exp decoded locally (base64, no signature verification) for token refresh timing.
**Authorization**: None at the detection service level. Auth is delegated to the Annotations service.
**Data protection**:
- At rest: not applicable (no local persistence of detection results)
- In transit: no TLS configured at application level (expected to be handled by infrastructure/reverse proxy)
- Secrets management: tokens received per-request, no stored credentials
**Audit logging**: Inference activity logged to daily rotated files. No auth audit logging.
## 8. Key Architectural Decisions
### ADR-001: Cython for Inference Pipeline
**Context**: Detection postprocessing involves tight loops over bounding box coordinates with floating-point math.
**Decision**: Implement the inference pipeline, data models, and engines as Cython `cdef` classes with typed variables.
**Alternatives considered**:
1. Pure Python — rejected due to loop-heavy postprocessing performance
2. C/C++ extension — rejected for development velocity; Cython offers C-speed with Python-like syntax
**Consequences**: Build step required (setup.py + Cython compilation). IDE support and debugging more complex.
### ADR-002: Dual Engine Strategy (TensorRT + ONNX Fallback)
**Context**: Need maximum GPU inference speed where available, but must also run on CPU-only machines.
**Decision**: Check GPU at module load time. If compatible NVIDIA GPU found, use TensorRT; otherwise fall back to ONNX Runtime. Background-convert ONNX→TensorRT and cache the engine.
**Alternatives considered**:
1. TensorRT only — rejected; would break CPU-only development/testing
2. ONNX only — rejected; significantly slower on GPU vs TensorRT
**Consequences**: Two code paths to maintain. GPU-specific engine files cached per architecture.
### ADR-003: Lazy Inference Initialization
**Context**: Engine initialization is slow (model download, possible conversion). API should start accepting health checks immediately.
**Decision**: `Inference` is created on first actual detection request, not at app startup. Health endpoint works without engine.
**Consequences**: First detection request has higher latency. `AIAvailabilityStatus` reports state transitions during initialization.
### ADR-004: Large Image Tiling with GSD-Based Sizing
**Context**: Aerial images can be much larger than the model's fixed input size (1280×1280). Simple resize would lose small object detail.
**Decision**: Split large images into tiles sized by ground sampling distance (`METERS_IN_TILE / GSD` pixels) with configurable overlap. Deduplicate detections across tile boundaries.
**Consequences**: More complex pipeline. Tile deduplication relies on coordinate proximity threshold.