# Input Data Parameters ## Media Input ### Upload Detection (POST /detect) | Parameter | Type | Source | Description | |-----------|------|--------|-------------| | file | bytes (multipart) | Client upload | Image or video file (JPEG, PNG, MP4, MOV, etc.) | | config | JSON string (optional) | Form field | AIConfigDto overrides | | Authorization header | Bearer token (optional) | HTTP header | JWT for media lifecycle management | | x-refresh-token header | string (optional) | HTTP header | Refresh token for JWT renewal | When auth headers are present, the service: computes an XxHash64 content hash, persists the file to `VIDEOS_DIR`/`IMAGES_DIR`, creates a media record via Annotations API, and tracks processing status. ### Media Detection (POST /detect/{media_id}) | Parameter | Type | Source | Description | |-----------|------|--------|-------------| | media_id | string | URL path | Identifier for media in the Annotations service | | AIConfigDto body | JSON (optional) | Request body | Configuration overrides (merged with DB settings) | | Authorization header | Bearer token | HTTP header | JWT for Annotations service | | x-refresh-token header | string | HTTP header | Refresh token for JWT renewal | Media path is resolved from the Annotations service via `GET /api/media/{media_id}`. AI settings are fetched from `GET /api/users/{user_id}/ai-settings` and merged with client overrides. ## Configuration Input (AIConfigDto / AIRecognitionConfig) | Field | Type | Default | Range/Meaning | |-------|------|---------|---------------| | frame_period_recognition | int | 4 | Process every Nth video frame | | frame_recognition_seconds | int | 2 | Minimum seconds between video annotations | | probability_threshold | float | 0.25 | Minimum detection confidence (0..1) | | tracking_distance_confidence | float | 0.0 | Movement threshold for tracking (model-width fraction) | | tracking_probability_increase | float | 0.0 | Confidence increase threshold for tracking | | tracking_intersection_threshold | float | 0.6 | Overlap ratio for NMS deduplication | | model_batch_size | int | 8 | Inference batch size | | big_image_tile_overlap_percent | int | 20 | Tile overlap for large images (0-100%) | | altitude | float | optional | Camera altitude in meters. When omitted, GSD-based size filtering and image tiling are skipped. | | focal_length | float | 24 | Camera focal length in mm | | sensor_width | float | 23.5 | Camera sensor width in mm | `paths` field was removed in AZ-174 — media paths are now resolved via the Annotations service. ## Model Files | File | Format | Source | Description | |------|--------|--------|-------------| | azaion.onnx | ONNX | Loader service | Base detection model | | azaion.cc_{M}.{m}_sm_{N}.engine | TensorRT | Loader service (cached) | GPU-specific compiled engine | ## Static Data ### classes.json Array of 19 objects, each with: | Field | Type | Example | Description | |-------|------|---------|-------------| | Id | int | 0 | Class identifier | | Name | string | "ArmorVehicle" | English class name | | ShortName | string | "Броня" | Ukrainian short name | | Color | string | "#ff0000" | Hex color for visualization | | MaxSizeM | int | 8 | Maximum physical object size in meters | ## Data Volumes - Single image: up to tens of megapixels (aerial imagery). Large images are tiled. - Video: processed frame-by-frame with configurable sampling rate. Decoded from in-memory bytes via PyAV. - Model file: ONNX model size depends on architecture (typically 10-100 MB). TensorRT engines are GPU-specific compiled versions. - Detection output: up to 300 detections per frame (model limit). ## Data Formats | Data | Format | Serialization | |------|--------|---------------| | API requests | HTTP multipart / JSON | Pydantic validation | | API responses | JSON | Pydantic model_dump | | SSE events | text/event-stream | JSON per event | | Internal config | Python dict | AIRecognitionConfig.from_dict() | | Content hash | XxHash64 hex string | 16-char hex digest |