Add detailed file index and enhance skill documentation for autopilot, decompose, deploy, plan, and research skills. Introduce tests-only mode in decompose skill, clarify required files for deploy and plan skills, and improve prerequisite checks across skills for better user guidance and workflow efficiency.

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-22 16:15:49 +02:00
parent 60ebe686ff
commit 3165a88f0b
60 changed files with 6324 additions and 1550 deletions
+33
View File
@@ -0,0 +1,33 @@
# Restrictions
## Hardware
- **GPU**: NVIDIA GPU with compute capability ≥ 6.1 required for TensorRT acceleration. Without a compatible GPU, the system falls back to ONNX Runtime (CPU or CUDA provider).
- **GPU memory**: TensorRT model conversion uses 90% of available GPU memory as workspace. Minimum ~2 GB GPU memory assumed (default fallback value).
- **Concurrency**: ThreadPoolExecutor limited to 2 workers — maximum 2 concurrent inference operations.
## Software
- **Python 3** with Cython 3.1.3 compilation required (setup.py build step).
- **ONNX model**: `azaion.onnx` must be available via the Loader service.
- **TensorRT engine files** are GPU-architecture-specific (filename encodes compute capability and SM count) — not portable across different GPU models.
- **OpenCV 4.10.0** for image/video decoding and preprocessing.
- **classes.json** must exist in the working directory at startup — no fallback if missing.
- **Model input**: fixed 1280×1280 default for dynamic dimensions (hardcoded in TensorRT engine).
- **Model output**: maximum 300 detections per frame, 6 values per detection (x1, y1, x2, y2, confidence, class_id).
## Environment
- **LOADER_URL** environment variable (default: `http://loader:8080`) — Loader service must be reachable for model download/upload.
- **ANNOTATIONS_URL** environment variable (default: `http://annotations:8080`) — Annotations service must be reachable for result posting and token refresh.
- **Logging directory**: `Logs/` directory must be writable for loguru file output.
- **No local model storage**: models are downloaded on demand from the Loader service; converted TensorRT engines are uploaded back for caching.
## Operational
- **No persistent storage**: the service is stateless regarding detection results — all results are returned via HTTP/SSE or forwarded to the Annotations service.
- **No TLS at application level**: encryption in transit is expected to be handled by infrastructure (reverse proxy / service mesh).
- **No CORS configuration**: cross-origin requests are not explicitly handled.
- **No rate limiting**: the service has no built-in throttling.
- **No graceful shutdown**: in-progress detections are not drained on shutdown; background TensorRT conversion runs in a daemon thread.
- **Single-instance state**: `_active_detections` dict and `_event_queues` list are in-memory — not shared across instances or persistent across restarts.