Files
detections/_docs/00_problem/restrictions.md
T

34 lines
2.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Restrictions
## Hardware
- **GPU**: NVIDIA GPU with compute capability ≥ 6.1 required for TensorRT acceleration. Without a compatible GPU, the system falls back to ONNX Runtime (CPU or CUDA provider).
- **GPU memory**: TensorRT model conversion uses 90% of available GPU memory as workspace. Minimum ~2 GB GPU memory assumed (default fallback value).
- **Concurrency**: ThreadPoolExecutor limited to 2 workers — maximum 2 concurrent inference operations.
## Software
- **Python 3** with Cython 3.1.3 compilation required (setup.py build step).
- **ONNX model**: `azaion.onnx` must be available via the Loader service.
- **TensorRT engine files** are GPU-architecture-specific (filename encodes compute capability and SM count) — not portable across different GPU models.
- **OpenCV 4.10.0** for image/video decoding and preprocessing.
- **classes.json** must exist in the working directory at startup — no fallback if missing.
- **Model input**: fixed 1280×1280 default for dynamic dimensions (hardcoded in TensorRT engine).
- **Model output**: maximum 300 detections per frame, 6 values per detection (x1, y1, x2, y2, confidence, class_id).
## Environment
- **LOADER_URL** environment variable (default: `http://loader:8080`) — Loader service must be reachable for model download/upload.
- **ANNOTATIONS_URL** environment variable (default: `http://annotations:8080`) — Annotations service must be reachable for result posting and token refresh.
- **Logging directory**: `Logs/` directory must be writable for loguru file output.
- **No local model storage**: models are downloaded on demand from the Loader service; converted TensorRT engines are uploaded back for caching.
## Operational
- **No persistent storage**: the service is stateless regarding detection results — all results are returned via HTTP/SSE or forwarded to the Annotations service.
- **No TLS at application level**: encryption in transit is expected to be handled by infrastructure (reverse proxy / service mesh).
- **No CORS configuration**: cross-origin requests are not explicitly handled.
- **No rate limiting**: the service has no built-in throttling.
- **No graceful shutdown**: in-progress detections are not drained on shutdown; background TensorRT conversion runs in a daemon thread.
- **Single-instance state**: `_active_detections` dict and `_event_queues` list are in-memory — not shared across instances or persistent across restarts.