mirror of
https://github.com/azaion/detections.git
synced 2026-04-22 22:56:31 +00:00
34 lines
2.4 KiB
Markdown
34 lines
2.4 KiB
Markdown
# Restrictions
|
||
|
||
## Hardware
|
||
|
||
- **GPU**: NVIDIA GPU with compute capability ≥ 6.1 required for TensorRT acceleration. Without a compatible GPU, the system falls back to ONNX Runtime (CPU or CUDA provider).
|
||
- **GPU memory**: TensorRT model conversion uses 90% of available GPU memory as workspace. Minimum ~2 GB GPU memory assumed (default fallback value).
|
||
- **Concurrency**: ThreadPoolExecutor limited to 2 workers — maximum 2 concurrent inference operations.
|
||
|
||
## Software
|
||
|
||
- **Python 3** with Cython 3.1.3 compilation required (setup.py build step).
|
||
- **ONNX model**: `azaion.onnx` must be available via the Loader service.
|
||
- **TensorRT engine files** are GPU-architecture-specific (filename encodes compute capability and SM count) — not portable across different GPU models.
|
||
- **OpenCV 4.10.0** for image/video decoding and preprocessing.
|
||
- **classes.json** must exist in the working directory at startup — no fallback if missing.
|
||
- **Model input**: fixed 1280×1280 default for dynamic dimensions (hardcoded in TensorRT engine).
|
||
- **Model output**: maximum 300 detections per frame, 6 values per detection (x1, y1, x2, y2, confidence, class_id).
|
||
|
||
## Environment
|
||
|
||
- **LOADER_URL** environment variable (default: `http://loader:8080`) — Loader service must be reachable for model download/upload.
|
||
- **ANNOTATIONS_URL** environment variable (default: `http://annotations:8080`) — Annotations service must be reachable for result posting and token refresh.
|
||
- **Logging directory**: `Logs/` directory must be writable for loguru file output.
|
||
- **No local model storage**: models are downloaded on demand from the Loader service; converted TensorRT engines are uploaded back for caching.
|
||
|
||
## Operational
|
||
|
||
- **No persistent storage**: the service is stateless regarding detection results — all results are returned via HTTP/SSE or forwarded to the Annotations service.
|
||
- **No TLS at application level**: encryption in transit is expected to be handled by infrastructure (reverse proxy / service mesh).
|
||
- **No CORS configuration**: cross-origin requests are not explicitly handled.
|
||
- **No rate limiting**: the service has no built-in throttling.
|
||
- **No graceful shutdown**: in-progress detections are not drained on shutdown; background TensorRT conversion runs in a daemon thread.
|
||
- **Single-instance state**: `_active_detections` dict and `_event_queues` list are in-memory — not shared across instances or persistent across restarts.
|