Add detailed file index and enhance skill documentation for autopilot, decompose, deploy, plan, and research skills. Introduce tests-only mode in decompose skill, clarify required files for deploy and plan skills, and improve prerequisite checks across skills for better user guidance and workflow efficiency.

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-22 16:15:49 +02:00
parent 60ebe686ff
commit 3165a88f0b
60 changed files with 6324 additions and 1550 deletions
+49
View File
@@ -0,0 +1,49 @@
# Acceptance Criteria
## Detection Accuracy
- Detections with confidence below `probability_threshold` (default: 0.25) are filtered out.
- Overlapping detections with containment ratio > `tracking_intersection_threshold` (default: 0.6) are deduplicated, keeping the higher-confidence detection.
- Tile duplicate detections are identified when all bounding box coordinates differ by less than 0.01 (TILE_DUPLICATE_CONFIDENCE_THRESHOLD).
- Physical size filtering: detections exceeding `max_object_size_meters` for their class (defined in classes.json, range 220 meters) are removed.
## Video Processing
- Frame sampling: every Nth frame processed, controlled by `frame_period_recognition` (default: 4).
- Minimum annotation interval: `frame_recognition_seconds` (default: 2 seconds) between reported annotations.
- Tracking: new annotation accepted if any detection moved beyond `tracking_distance_confidence` threshold or confidence increased beyond `tracking_probability_increase`.
## Image Processing
- Images ≤ 1.5× model dimensions (1280×1280): processed as single frame.
- Larger images: tiled based on ground sampling distance. Tile physical size: 25 meters (METERS_IN_TILE). Tile overlap: `big_image_tile_overlap_percent` (default: 20%).
- GSD calculation: `sensor_width * altitude / (focal_length * image_width)`.
## API
- `GET /health` always returns `status: "healthy"` (even if engine is unavailable — aiAvailability indicates actual state).
- `POST /detect` returns detection results synchronously. Errors: 400 (empty/invalid image), 422 (runtime error), 503 (engine unavailable).
- `POST /detect/{media_id}` returns immediately with `{"status": "started"}`. Rejects duplicate media_id with 409.
- `GET /detect/stream` delivers SSE events with `mediaStatus` values: AIProcessing, AIProcessed, Error.
- SSE queue maximum depth: 100 events per client. Overflow is silently dropped.
## Engine Lifecycle
- Engine initialization is lazy (first detection request, not startup).
- Status transitions: NONE → DOWNLOADING → (CONVERTING → UPLOADING →) ENABLED | WARNING | ERROR.
- GPU check: NVIDIA GPU with compute capability ≥ 6.1.
- TensorRT conversion uses FP16 precision when GPU supports fast FP16.
- Background conversion does not block API responsiveness.
## Logging
- Log files: `Logs/log_inference_YYYYMMDD.txt`.
- Rotation: daily.
- Retention: 30 days.
- Console: INFO/DEBUG/SUCCESS to stdout, WARNING+ to stderr.
## Object Classes
- 19 base detection classes defined in `classes.json`.
- 3 weather modes (Norm, Wint, Night) — total up to 57 class variants.
- Each class has: Id, Name, Color, MaxSizeM (max physical size in meters).
@@ -0,0 +1,76 @@
# Input Data Parameters
## Media Input
### Single Image Detection (POST /detect)
| Parameter | Type | Source | Description |
|-----------|------|--------|-------------|
| file | bytes (multipart) | Client upload | Image file (JPEG, PNG, etc. — any format OpenCV can decode) |
| config | JSON string (optional) | Query/form field | AIConfigDto overrides |
### Media Detection (POST /detect/{media_id})
| Parameter | Type | Source | Description |
|-----------|------|--------|-------------|
| media_id | string | URL path | Identifier for media in the Loader service |
| AIConfigDto body | JSON (optional) | Request body | Configuration overrides |
| Authorization header | Bearer token | HTTP header | JWT for Annotations service |
| x-refresh-token header | string | HTTP header | Refresh token for JWT renewal |
Media files (images and videos) are resolved by the Inference pipeline via paths in the config. The Loader service provides model files, not media files directly.
## Configuration Input (AIConfigDto / AIRecognitionConfig)
| Field | Type | Default | Range/Meaning |
|-------|------|---------|---------------|
| frame_period_recognition | int | 4 | Process every Nth video frame |
| frame_recognition_seconds | int | 2 | Minimum seconds between video annotations |
| probability_threshold | float | 0.25 | Minimum detection confidence (0..1) |
| tracking_distance_confidence | float | 0.0 | Movement threshold for tracking (model-width fraction) |
| tracking_probability_increase | float | 0.0 | Confidence increase threshold for tracking |
| tracking_intersection_threshold | float | 0.6 | Overlap ratio for NMS deduplication |
| model_batch_size | int | 1 | Inference batch size |
| big_image_tile_overlap_percent | int | 20 | Tile overlap for large images (0-100%) |
| altitude | float | 400 | Camera altitude in meters |
| focal_length | float | 24 | Camera focal length in mm |
| sensor_width | float | 23.5 | Camera sensor width in mm |
| paths | list[str] | [] | Media file paths to process |
## Model Files
| File | Format | Source | Description |
|------|--------|--------|-------------|
| azaion.onnx | ONNX | Loader service | Base detection model |
| azaion.cc_{M}.{m}_sm_{N}.engine | TensorRT | Loader service (cached) | GPU-specific compiled engine |
## Static Data
### classes.json
Array of 19 objects, each with:
| Field | Type | Example | Description |
|-------|------|---------|-------------|
| Id | int | 0 | Class identifier |
| Name | string | "ArmorVehicle" | English class name |
| ShortName | string | "Броня" | Ukrainian short name |
| Color | string | "#ff0000" | Hex color for visualization |
| MaxSizeM | int | 8 | Maximum physical object size in meters |
## Data Volumes
- Single image: up to tens of megapixels (aerial imagery). Large images are tiled.
- Video: processed frame-by-frame with configurable sampling rate.
- Model file: ONNX model size depends on architecture (typically 10-100 MB). TensorRT engines are GPU-specific compiled versions.
- Detection output: up to 300 detections per frame (model limit).
## Data Formats
| Data | Format | Serialization |
|------|--------|---------------|
| API requests | HTTP multipart / JSON | Pydantic validation |
| API responses | JSON | Pydantic model_dump |
| SSE events | text/event-stream | JSON per event |
| Internal config | Python dict | AIRecognitionConfig.from_dict() |
| Legacy (unused) | msgpack | serialize() / from_msgpack() |
+28
View File
@@ -0,0 +1,28 @@
# Problem Statement
## What is this system?
Azaion.Detections is an AI-powered object detection microservice designed for aerial reconnaissance. It processes drone and satellite imagery (both still images and video) to automatically identify and locate military and infrastructure objects — including armored vehicles, trucks, artillery, trenches, personnel, camouflage nets, buildings, and more.
## What problem does it solve?
Manual analysis of aerial imagery is slow, error-prone, and does not scale. When monitoring large areas from drones or satellites, a human analyst cannot review every frame in real time. This service automates the detection process: given an image or video feed, it returns structured bounding boxes with object classifications and confidence scores, enabling rapid situational awareness.
## Who are the users?
- **Client applications** that submit media for analysis (via HTTP API)
- **Downstream services** (Annotations service) that store and present detection results
- **Real-time consumers** that subscribe to Server-Sent Events for live detection updates during video processing
## How does it work at a high level?
1. A client sends an image or triggers detection on media files available in the Loader service
2. The service preprocesses frames — resizing, normalizing, and for large aerial images, splitting into GSD-based tiles to preserve small object detail
3. Frames are batched and run through a YOLO-based object detection model via TensorRT (GPU) or ONNX Runtime (CPU fallback)
4. Raw model output is postprocessed: coordinate normalization, confidence thresholding, overlapping detection removal, physical size filtering, and tile deduplication
5. Results are returned as structured DTOs (bounding box center, dimensions, class label, confidence)
6. For video/batch processing, results are streamed in real-time via SSE and optionally posted to an external Annotations service
## Domain context
The system operates in a military/defense aerial reconnaissance context. The 19 object classes (ArmorVehicle, Truck, Vehicle, Artillery, Shadow, Trenches, MilitaryMan, TyreTracks, AdditArmoredTank, Smoke, Plane, Moto, CamouflageNet, CamouflageBranches, Roof, Building, Caponier, Ammo, Protect.Struct) reflect objects of interest in ground surveillance. Three weather modes (Normal, Winter, Night) provide environment-specific detection variants. Physical size filtering using ground sampling distance ensures detections are physically plausible given camera altitude and optics.
+33
View File
@@ -0,0 +1,33 @@
# Restrictions
## Hardware
- **GPU**: NVIDIA GPU with compute capability ≥ 6.1 required for TensorRT acceleration. Without a compatible GPU, the system falls back to ONNX Runtime (CPU or CUDA provider).
- **GPU memory**: TensorRT model conversion uses 90% of available GPU memory as workspace. Minimum ~2 GB GPU memory assumed (default fallback value).
- **Concurrency**: ThreadPoolExecutor limited to 2 workers — maximum 2 concurrent inference operations.
## Software
- **Python 3** with Cython 3.1.3 compilation required (setup.py build step).
- **ONNX model**: `azaion.onnx` must be available via the Loader service.
- **TensorRT engine files** are GPU-architecture-specific (filename encodes compute capability and SM count) — not portable across different GPU models.
- **OpenCV 4.10.0** for image/video decoding and preprocessing.
- **classes.json** must exist in the working directory at startup — no fallback if missing.
- **Model input**: fixed 1280×1280 default for dynamic dimensions (hardcoded in TensorRT engine).
- **Model output**: maximum 300 detections per frame, 6 values per detection (x1, y1, x2, y2, confidence, class_id).
## Environment
- **LOADER_URL** environment variable (default: `http://loader:8080`) — Loader service must be reachable for model download/upload.
- **ANNOTATIONS_URL** environment variable (default: `http://annotations:8080`) — Annotations service must be reachable for result posting and token refresh.
- **Logging directory**: `Logs/` directory must be writable for loguru file output.
- **No local model storage**: models are downloaded on demand from the Loader service; converted TensorRT engines are uploaded back for caching.
## Operational
- **No persistent storage**: the service is stateless regarding detection results — all results are returned via HTTP/SSE or forwarded to the Annotations service.
- **No TLS at application level**: encryption in transit is expected to be handled by infrastructure (reverse proxy / service mesh).
- **No CORS configuration**: cross-origin requests are not explicitly handled.
- **No rate limiting**: the service has no built-in throttling.
- **No graceful shutdown**: in-progress detections are not drained on shutdown; background TensorRT conversion runs in a daemon thread.
- **Single-instance state**: `_active_detections` dict and `_event_queues` list are in-memory — not shared across instances or persistent across restarts.