Add detailed file index and enhance skill documentation for autopilot, decompose, deploy, plan, and research skills. Introduce tests-only mode in decompose skill, clarify required files for deploy and plan skills, and improve prerequisite checks across skills for better user guidance and workflow efficiency.

2026-06-22 06:41:09 +00:00 · 2026-03-22 16:15:49 +02:00
parent 60ebe686ff
commit 3165a88f0b
60 changed files with 6324 additions and 1550 deletions
@@ -0,0 +1,151 @@
+# Azaion.Detections — Architecture
+
+## 1. System Context
+
+**Problem being solved**: Automated object detection on aerial imagery and video — identifying military and infrastructure objects (vehicles, artillery, trenches, personnel, etc.) from drone/satellite feeds and returning structured detection results with bounding boxes, class labels, and confidence scores.
+
+**System boundaries**:
+- **Inside**: FastAPI HTTP service, Cython-based inference pipeline, ONNX/TensorRT inference engines, image tiling, video frame processing, detection postprocessing
+- **Outside**: Loader service (model storage), Annotations service (result persistence + auth), client applications
+
+**External systems**:
+
+| System | Integration Type | Direction | Purpose |
+|--------|-----------------|-----------|---------|
+| Loader Service | REST (HTTP) | Both | Download AI models, upload converted TensorRT engines |
+| Annotations Service | REST (HTTP) | Outbound | Post detection results, refresh auth tokens |
+| Client Applications | REST + SSE | Inbound | Submit detection requests, receive streaming results |
+
+## 2. Technology Stack
+
+| Layer | Technology | Version | Rationale |
+|-------|-----------|---------|-----------|
+| Language | Python 3 + Cython | 3.1.3 (Cython) | Python for API, Cython for performance-critical inference loops |
+| Framework | FastAPI + Uvicorn | latest | Async HTTP + SSE support |
+| ML Runtime (CPU) | ONNX Runtime | 1.22.0 | Portable model format, CPU/CUDA provider fallback |
+| ML Runtime (GPU) | TensorRT + PyCUDA | 10.11.0 / 2025.1.1 | Maximum GPU inference performance |
+| Image Processing | OpenCV | 4.10.0 | Frame decoding, preprocessing, tiling |
+| Serialization | msgpack | 1.1.1 | Compact binary serialization for annotations and configs |
+| HTTP Client | requests | 2.32.4 | Synchronous HTTP to Loader and Annotations services |
+| Logging | loguru | 0.7.3 | Structured file + console logging |
+| GPU Monitoring | pynvml | 12.0.0 | GPU detection, capability checks, memory queries |
+| Numeric | NumPy | 2.3.0 | Tensor manipulation |
+
+## 3. Deployment Model
+
+**Infrastructure**: Containerized microservice, deployed alongside Loader and Annotations services (likely Docker Compose or Kubernetes given service discovery by hostname).
+
+**Environment-specific configuration**:
+
+| Config | Development | Production |
+|--------|-------------|------------|
+| LOADER_URL | `http://loader:8080` (default) | Environment variable |
+| ANNOTATIONS_URL | `http://annotations:8080` (default) | Environment variable |
+| GPU | Optional (falls back to ONNX CPU) | Required (TensorRT) |
+| Logging | Console + file | File (`Logs/log_inference_YYYYMMDD.txt`, 30-day retention) |
+
+## 4. Data Model Overview
+
+**Core entities**:
+
+| Entity | Description | Owned By Component |
+|--------|-------------|--------------------|
+| AnnotationClass | Detection class metadata (name, color, max physical size) | 01 Domain |
+| Detection | Single bounding box with class + confidence | 01 Domain |
+| Annotation | Collection of detections for one frame/tile + image | 01 Domain |
+| AIRecognitionConfig | Runtime inference parameters | 01 Domain |
+| AIAvailabilityStatus | Engine lifecycle state | 01 Domain |
+| DetectionDto | API-facing detection response | 04 API |
+| DetectionEvent | SSE event payload | 04 API |
+
+**Key relationships**:
+- Annotation → Detection: one-to-many (detections within a frame/tile)
+- Detection → AnnotationClass: many-to-one (via class ID lookup in annotations_dict)
+- Annotation → Media: many-to-one (multiple annotations per video/image)
+
+**Data flow summary**:
+- Media bytes → Preprocessing → Engine → Raw output → Postprocessing → Detection/Annotation → DTO → HTTP/SSE response
+- ONNX model bytes → Loader → Engine init (or TensorRT conversion → upload back to Loader)
+
+## 5. Integration Points
+
+### Internal Communication
+
+| From | To | Protocol | Pattern | Notes |
+|------|----|----------|---------|-------|
+| API | Inference Pipeline | Direct Python call | Sync (via ThreadPoolExecutor) | Lazy initialization |
+| Inference Pipeline | Inference Engines | Direct Cython call | Sync | Strategy pattern selection |
+| Inference Pipeline | Loader | HTTP POST | Request-Response | Model download/upload |
+
+### External Integrations
+
+| External System | Protocol | Auth | Rate Limits | Failure Mode |
+|----------------|----------|------|-------------|--------------|
+| Loader Service | HTTP POST | None | None observed | Exception → LoadResult(err) |
+| Annotations Service | HTTP POST | Bearer JWT | None observed | Exception silently caught |
+| Annotations Auth | HTTP POST | Refresh token | None observed | Exception silently caught |
+
+## 6. Non-Functional Requirements
+
+| Requirement | Target | Measurement | Priority |
+|------------|--------|-------------|----------|
+| Concurrent inference | 2 parallel jobs max | ThreadPoolExecutor workers | High |
+| SSE queue depth | 100 events per client | asyncio.Queue maxsize | Medium |
+| Log retention | 30 days | loguru rotation config | Medium |
+| GPU compatibility | Compute capability ≥ 6.1 | pynvml check at startup | High |
+| Model format | ONNX (portable) + TensorRT (GPU-specific) | Engine filename includes CC+SM | High |
+
+## 7. Security Architecture
+
+**Authentication**: Pass-through Bearer JWT from client → forwarded to Annotations service. JWT exp decoded locally (base64, no signature verification) for token refresh timing.
+
+**Authorization**: None at the detection service level. Auth is delegated to the Annotations service.
+
+**Data protection**:
+- At rest: not applicable (no local persistence of detection results)
+- In transit: no TLS configured at application level (expected to be handled by infrastructure/reverse proxy)
+- Secrets management: tokens received per-request, no stored credentials
+
+**Audit logging**: Inference activity logged to daily rotated files. No auth audit logging.
+
+## 8. Key Architectural Decisions
+
+### ADR-001: Cython for Inference Pipeline
+
+**Context**: Detection postprocessing involves tight loops over bounding box coordinates with floating-point math.
+
+**Decision**: Implement the inference pipeline, data models, and engines as Cython `cdef` classes with typed variables.
+
+**Alternatives considered**:
+1. Pure Python — rejected due to loop-heavy postprocessing performance
+2. C/C++ extension — rejected for development velocity; Cython offers C-speed with Python-like syntax
+
+**Consequences**: Build step required (setup.py + Cython compilation). IDE support and debugging more complex.
+
+### ADR-002: Dual Engine Strategy (TensorRT + ONNX Fallback)
+
+**Context**: Need maximum GPU inference speed where available, but must also run on CPU-only machines.
+
+**Decision**: Check GPU at module load time. If compatible NVIDIA GPU found, use TensorRT; otherwise fall back to ONNX Runtime. Background-convert ONNX→TensorRT and cache the engine.
+
+**Alternatives considered**:
+1. TensorRT only — rejected; would break CPU-only development/testing
+2. ONNX only — rejected; significantly slower on GPU vs TensorRT
+
+**Consequences**: Two code paths to maintain. GPU-specific engine files cached per architecture.
+
+### ADR-003: Lazy Inference Initialization
+
+**Context**: Engine initialization is slow (model download, possible conversion). API should start accepting health checks immediately.
+
+**Decision**: `Inference` is created on first actual detection request, not at app startup. Health endpoint works without engine.
+
+**Consequences**: First detection request has higher latency. `AIAvailabilityStatus` reports state transitions during initialization.
+
+### ADR-004: Large Image Tiling with GSD-Based Sizing
+
+**Context**: Aerial images can be much larger than the model's fixed input size (1280×1280). Simple resize would lose small object detail.
+
+**Decision**: Split large images into tiles sized by ground sampling distance (`METERS_IN_TILE / GSD` pixels) with configurable overlap. Deduplicate detections across tile boundaries.
+
+**Consequences**: More complex pipeline. Tile deduplication relies on coordinate proximity threshold.