mirror of
https://github.com/azaion/detections.git
synced 2026-04-22 21:36:32 +00:00
2.4 KiB
2.4 KiB
Restrictions
Hardware
- GPU: NVIDIA GPU with compute capability ≥ 6.1 required for TensorRT acceleration. Without a compatible GPU, the system falls back to ONNX Runtime (CPU or CUDA provider).
- GPU memory: TensorRT model conversion uses 90% of available GPU memory as workspace. Minimum ~2 GB GPU memory assumed (default fallback value).
- Concurrency: ThreadPoolExecutor limited to 2 workers — maximum 2 concurrent inference operations.
Software
- Python 3 with Cython 3.1.3 compilation required (setup.py build step).
- ONNX model:
azaion.onnxmust be available via the Loader service. - TensorRT engine files are GPU-architecture-specific (filename encodes compute capability and SM count) — not portable across different GPU models.
- OpenCV 4.10.0 for image/video decoding and preprocessing.
- classes.json must exist in the working directory at startup — no fallback if missing.
- Model input: fixed 1280×1280 default for dynamic dimensions (hardcoded in TensorRT engine).
- Model output: maximum 300 detections per frame, 6 values per detection (x1, y1, x2, y2, confidence, class_id).
Environment
- LOADER_URL environment variable (default:
http://loader:8080) — Loader service must be reachable for model download/upload. - ANNOTATIONS_URL environment variable (default:
http://annotations:8080) — Annotations service must be reachable for result posting and token refresh. - Logging directory:
Logs/directory must be writable for loguru file output. - No local model storage: models are downloaded on demand from the Loader service; converted TensorRT engines are uploaded back for caching.
Operational
- No persistent storage: the service is stateless regarding detection results — all results are returned via HTTP/SSE or forwarded to the Annotations service.
- No TLS at application level: encryption in transit is expected to be handled by infrastructure (reverse proxy / service mesh).
- No CORS configuration: cross-origin requests are not explicitly handled.
- No rate limiting: the service has no built-in throttling.
- No graceful shutdown: in-progress detections are not drained on shutdown; background TensorRT conversion runs in a daemon thread.
- Single-instance state:
_active_detectionsdict and_event_queueslist are in-memory — not shared across instances or persistent across restarts.