Merge branch 'dev' of https://github.com/azaion/detections into dev
ci/woodpecker/push/02-build-push Pipeline was successful

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-17 13:19:25 +03:00
24 changed files with 345 additions and 68 deletions
+9 -2
View File
@@ -30,9 +30,16 @@ steps:
echo "ERROR: fixtures/image_small.jpg is missing; cannot warm up Jetson engine"
exit 1
fi
ls -lh fixtures/image_small.jpg
if [ ! -f fixtures/video_test01.mp4 ]; then
echo "ERROR: fixtures/video_test01.mp4 is missing; cannot run Jetson video smoke test"
exit 1
fi
ls -lh fixtures/image_small.jpg fixtures/video_test01.mp4
bash scripts/pull_jetson_engine.sh
E2E_PROFILE=jetson bash run_test.sh tests/test_health_engine.py::TestHealthEngineStep03Warmed -rs
E2E_PROFILE=jetson bash run_test.sh \
tests/test_health_engine.py::TestHealthEngineStep03Warmed \
tests/test_video.py::test_ft_p_10_frame_sampling_ac1 \
-rs
volumes:
- /var/run/docker.sock:/var/run/docker.sock
+1 -1
View File
@@ -17,7 +17,7 @@
- Images ≤ 1.5× model dimensions (1280×1280): processed as single frame.
- Larger images: tiled based on ground sampling distance. Tile physical size: 25 meters (METERS_IN_TILE). Tile overlap: `big_image_tile_overlap_percent` (default: 20%).
- GSD calculation: `sensor_width * altitude / (focal_length * image_width)` when `altitude` is provided.
- GSD calculation: `sensor_width * current_height / (focal_length * current_zoom * image_width * sin(current_angle))` when `camera_config.current_height` and valid camera parameters are provided. `current_angle` is in degrees and defaults to 90.
## API
+11 -1
View File
@@ -36,9 +36,19 @@ Media path is resolved from the Annotations service via `GET /api/media/{media_i
| tracking_intersection_threshold | float | 0.6 | Overlap ratio for NMS deduplication |
| model_batch_size | int | 8 | Inference batch size |
| big_image_tile_overlap_percent | int | 20 | Tile overlap for large images (0-100%) |
| altitude | float | optional | Camera altitude in meters. When omitted, GSD-based size filtering and image tiling are skipped. |
| camera_config | object | null | Camera parameters for GSD. When omitted or missing height, GSD-based size filtering and image tiling are skipped. |
### camera_config
| Field | Type | Default | Range/Meaning |
|-------|------|---------|---------------|
| focal_length | float | 24 | Camera focal length in mm |
| sensor_width | float | 23.5 | Camera sensor width in mm |
| current_zoom | float | 1 | Optical zoom multiplier; effective focal length is `focal_length * current_zoom` |
| current_angle | float | 90 | Camera angle in degrees; 90 is nadir/downward |
| current_height | float | optional | Camera height in meters |
Legacy flat `altitude`, `focal_length`, and `sensor_width` keys are still accepted for backward compatibility, but new clients should send `camera_config`.
`paths` field was removed in AZ-174 — media paths are now resolved via the Annotations service.
+1 -1
View File
@@ -32,7 +32,7 @@ graph LR
| Cython inference pipeline | Python 3, Cython 3.1.3, OpenCV 4.10 | Near-C performance for tight detection loops while retaining Python ecosystem | Build complexity, limited IDE/debug support | Compilation step via setup.py | N/A | Low (open-source) | High — critical for postprocessing throughput |
| Dual engine strategy (TensorRT + ONNX) | TensorRT 10.11, ONNX Runtime 1.22 | Maximum GPU speed with CPU fallback; auto-conversion and caching | Two code paths; GPU-specific engine files not portable | NVIDIA GPU (CC ≥ 6.1) for TensorRT | N/A | TensorRT free for NVIDIA GPUs | High — balances performance and portability |
| FastAPI HTTP service | FastAPI, Uvicorn, Pydantic | Async SSE, auto-generated docs, fast development | Sync inference offloaded to ThreadPoolExecutor (2 workers) | Python 3.8+ | Bearer token pass-through | Low (open-source) | High — fits async streaming + sync inference pattern |
| GSD-based image tiling | OpenCV, NumPy | Preserves small object detail in large aerial images | Complex tile dedup logic; overlap increases compute | Camera metadata (altitude, focal length, sensor width) | N/A | Compute cost scales with image size | High — essential for aerial imagery use case |
| GSD-based image tiling | OpenCV, NumPy | Preserves small object detail in large aerial images | Complex tile dedup logic; overlap increases compute | Camera metadata (`camera_config`: height, angle, zoom, focal length, sensor width) | N/A | Compute cost scales with image size | High — essential for aerial imagery use case |
| Lazy engine initialization | pynvml, threading | Fast API startup; background model conversion | First request has high latency; engine may be unavailable | None | N/A | N/A | High — prevents blocking startup on slow model download/conversion |
## 3. Testing Strategy
@@ -109,7 +109,7 @@ None — internal component, consumed by API layer.
### Large Image Tiling
- Ground Sampling Distance: `sensor_width * altitude / (focal_length * image_width)`
- Ground Sampling Distance: `sensor_width * current_height / (focal_length * current_zoom * image_width * sin(current_angle))`
- Tile size: `METERS_IN_TILE / GSD` pixels
- Overlap: configurable percentage
- Tile deduplication: absolute-coordinate Detection equality across adjacent tiles
+8 -4
View File
@@ -37,9 +37,13 @@ erDiagram
double tracking_intersection_threshold
int big_image_tile_overlap_percent
int model_batch_size
double altitude
bool has_camera_config
double current_height
double current_zoom
double current_angle
double focal_length
double sensor_width
double altitude
}
AIAvailabilityStatus {
@@ -107,7 +111,7 @@ Groups detections for a single frame or image tile.
### AIRecognitionConfig
Runtime configuration for inference behavior. Created from dict (API) or msgpack (internal).
Runtime configuration for inference behavior. Created from dict (API). Camera values are grouped under `camera_config` at the API boundary and expanded into `current_height`, `current_zoom`, `current_angle`, `focal_length`, and `sensor_width` internally. `altitude` remains as a legacy alias for `current_height`.
### AIAvailabilityStatus
@@ -125,7 +129,7 @@ SSE event payload. Status values: AIProcessing, AIProcessed, Error.
### AIConfigDto
API input configuration. Same fields as AIRecognitionConfig with defaults.
API input configuration. Same inference fields as `AIRecognitionConfig` with defaults, plus nested `camera_config` for GSD and physical-size filtering.
### HealthResponse
@@ -144,7 +148,7 @@ Annotation names encode media source and processing context:
| Entity | Format | Usage |
|--------|--------|-------|
| Detection/Annotation | msgpack (compact keys) | `annotation.serialize()` |
| AIRecognitionConfig | msgpack (compact keys) | `from_msgpack()` |
| AIRecognitionConfig | Python dict | `AIRecognitionConfig.from_dict()` |
| AIAvailabilityStatus | msgpack | `serialize()` |
| DetectionDto/Event | JSON (Pydantic) | HTTP API responses, SSE |
+7 -3
View File
@@ -20,9 +20,13 @@ Data class holding all AI recognition configuration parameters, with factory met
| `tracking_intersection_threshold` | double | 0.6 | IoU threshold for overlapping detection removal |
| `model_batch_size` | int | 1 | Batch size for inference |
| `big_image_tile_overlap_percent` | int | 20 | Tile overlap percentage for large image splitting |
| `altitude` | double? | optional | Camera altitude in meters. When missing, GSD-based filtering is disabled |
| `has_camera_config` | bool | false | Whether camera parameters were supplied |
| `current_height` | double | 0.0 | Camera height in meters, from `camera_config.current_height` |
| `current_zoom` | double | 1.0 | Camera zoom multiplier |
| `current_angle` | double | 90.0 | Camera angle in degrees; 90 is nadir/downward |
| `focal_length` | double | 24 | Camera focal length in mm |
| `sensor_width` | double | 23.5 | Camera sensor width in mm |
| `altitude` / `has_altitude` | double / bool | legacy | Backward-compatible aliases for older flat camera config |
#### Methods
@@ -32,7 +36,7 @@ Data class holding all AI recognition configuration parameters, with factory met
## Internal Logic
`from_dict` applies defaults for missing keys using full descriptive key names.
`from_dict` applies defaults for missing keys using full descriptive key names. Camera parameters are read from nested `camera_config` first; legacy flat `altitude`, `focal_length`, and `sensor_width` keys remain supported for older clients.
**Removed**: `paths` field and `file_data` field were removed as part of the distributed architecture shift (AZ-174). Media paths are now resolved via the Annotations service API, not passed in config. `from_msgpack()` was also removed as it was unused.
@@ -51,7 +55,7 @@ Data class holding all AI recognition configuration parameters, with factory met
## Configuration
Camera/altitude parameters (`altitude`, `focal_length`, `sensor_width`) are used for ground sampling distance calculation in aerial image processing. If `altitude` is missing, the service skips GSD-based size filtering and does not tile large images by physical size.
Camera parameters (`camera_config.focal_length`, `camera_config.sensor_width`, `camera_config.current_zoom`, `camera_config.current_angle`, `camera_config.current_height`) are used for ground sampling distance calculation in aerial image processing. If `camera_config` is missing or height/optics are invalid, the service skips GSD-based size filtering and does not tile large images by physical size.
## External Integrations
+1 -1
View File
@@ -90,7 +90,7 @@ Both `run_detect_image` and `run_detect_video` accept raw bytes instead of file
### Ground Sampling Distance (GSD)
`GSD = sensor_width * altitude / (focal_length * image_width)` — meters per pixel, used for physical size filtering of aerial detections.
`GSD = sensor_width * current_height / (focal_length * current_zoom * image_width * sin(current_angle))` — meters per pixel, used for physical size filtering of aerial detections. `current_angle` is configured in degrees and defaults to 90.
## Dependencies
+3 -2
View File
@@ -23,7 +23,8 @@ FastAPI application entry point — exposes HTTP API for object detection on ima
| `DetectionDto` | centerX, centerY, width, height, classNum, label, confidence | Single detection result |
| `DetectionEvent` | annotations (list[DetectionDto]), mediaId, mediaStatus, mediaPercent | SSE event payload |
| `HealthResponse` | status, aiAvailability, engineType, errorMessage | Health check response |
| `AIConfigDto` | frame_period_recognition, frame_recognition_seconds, probability_threshold, tracking_*, model_batch_size, big_image_tile_overlap_percent, altitude, focal_length, sensor_width | Configuration input (no `paths` field — removed in AZ-174) |
| `CameraConfigDto` | focal_length, sensor_width, current_zoom, current_angle, current_height | Camera input used for GSD and physical-size filtering |
| `AIConfigDto` | frame_period_recognition, frame_recognition_seconds, probability_threshold, tracking_*, model_batch_size, big_image_tile_overlap_percent, camera_config | Configuration input (no `paths` field — removed in AZ-174) |
### Class: TokenManager
@@ -37,7 +38,7 @@ FastAPI application entry point — exposes HTTP API for object detection on ima
| Function | Signature | Description |
|----------|-----------|-------------|
| `_merged_annotation_settings_payload` | `(raw: object) -> dict` | Merges nested AI settings from Annotations service response (handles `aiRecognitionSettings`, `cameraSettings` sub-objects and PascalCase/camelCase/snake_case aliases) |
| `_merged_annotation_settings_payload` | `(raw: object) -> dict` | Merges nested AI settings from Annotations service response (handles `aiRecognitionSettings`, `camera_config`/`cameraSettings` sub-objects and PascalCase/camelCase/snake_case aliases) |
| `_resolve_media_for_detect` | `(media_id, token_mgr, override) -> tuple[dict, str]` | Fetches user AI settings + media path from Annotations service, merges with client overrides |
| `_detect_upload_kind` | `(filename, data) -> tuple[str, str]` | Determines if upload is image or video by extension, falls back to content probing (cv2/PyAV) |
| `_post_media_record` | `(payload, bearer) -> bool` | Creates media record via `POST /api/media` on Annotations service |
+3 -3
View File
@@ -83,7 +83,7 @@
**Preconditions**:
- Engine is initialized
- Config includes altitude, focal_length, sensor_width for GSD calculation
- Config includes `camera_config` with `current_height`, `focal_length`, `sensor_width`, `current_zoom`, and `current_angle` for GSD calculation
**Input data**: large-image (4000×3000)
@@ -91,7 +91,7 @@
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | `POST /detect` with large-image and config `{"altitude": 400, "focal_length": 24, "sensor_width": 23.5}` | 200 OK |
| 1 | `POST /detect` with large-image and config `{"camera_config":{"current_height":400,"focal_length":24,"sensor_width":23.5,"current_zoom":1,"current_angle":90}}` | 200 OK |
| 2 | Parse response JSON | Array of detections |
| 3 | Verify detection coordinates | Bounding box coordinates are in 0.01.0 range relative to the full original image |
@@ -167,7 +167,7 @@
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | `POST /detect` with small-image and config `{"altitude": 400, "focal_length": 24, "sensor_width": 23.5}` | 200 OK |
| 1 | `POST /detect` with small-image and config `{"camera_config":{"current_height":400,"focal_length":24,"sensor_width":23.5,"current_zoom":1,"current_angle":90}}` | 200 OK |
| 2 | For each detection, compute physical size from bounding box + GSD | No detection's physical size exceeds the MaxSizeM defined for its class in classes.json |
**Expected outcome**: All returned detections have plausible physical dimensions for their class.
+4 -3
View File
@@ -249,7 +249,7 @@ def _health(http_client):
def _health_ai_active(data: dict) -> bool:
return data.get("aiAvailability") not in ("None", "Downloading", "Error")
return data.get("aiAvailability") == "Enabled"
def _wait_for_ai_active(http_client, timeout: float = 30) -> dict | None:
@@ -324,7 +324,8 @@ def corrupt_image():
@pytest.fixture(scope="module")
def warm_engine(http_client, image_small, auth_headers):
deadline = time.time() + 120
timeout = int(os.environ.get("E2E_ENGINE_WAIT_TIMEOUT", "900"))
deadline = time.time() + timeout
last_status = None
consecutive_errors = 0
@@ -403,4 +404,4 @@ def warm_engine(http_client, image_small, auth_headers):
th.join(timeout=1)
time.sleep(2)
pytest.fail(f"engine warm-up timed out after 120s (last status: {last_status})")
pytest.fail(f"engine warm-up timed out after {timeout}s (last status: {last_status})")
Binary file not shown.
+7 -3
View File
@@ -41,9 +41,13 @@ def user_ai_settings(user_id):
"tracking_intersection_threshold": 0.6,
"model_batch_size": 8,
"big_image_tile_overlap_percent": 20,
"altitude": 400,
"focal_length": 24,
"sensor_width": 23.5,
"camera_config": {
"focal_length": 24,
"sensor_width": 23.5,
"current_zoom": 1,
"current_angle": 90,
"current_height": 400,
},
}
+11 -1
View File
@@ -46,7 +46,17 @@ def test_nft_perf_03_tiling_overhead_large_image(
_, small_ms = image_detect(image_small, "small.jpg", timeout=20)
_, large_ms = image_detect(
image_large, "large.jpg",
config=json.dumps({"altitude": 400, "focal_length": 24, "sensor_width": 23.5}),
config=json.dumps(
{
"camera_config": {
"focal_length": 24,
"sensor_width": 23.5,
"current_zoom": 1,
"current_angle": 90,
"current_height": 400,
}
}
),
timeout=20,
)
assert large_ms < 30_000.0
+7 -3
View File
@@ -149,9 +149,13 @@ def test_ft_p_07_physical_size_filtering_ac4(image_detect, image_small, warm_eng
gsd = (sensor_width * altitude) / (focal_length * image_width_px)
cfg = json.dumps(
{
"altitude": altitude,
"focal_length": focal_length,
"sensor_width": sensor_width,
"camera_config": {
"focal_length": focal_length,
"sensor_width": sensor_width,
"current_zoom": 1,
"current_angle": 90,
"current_height": altitude,
},
}
)
body, _ = image_detect(image_small, "img.jpg", config=cfg, timeout=_DETECT_SLOW_TIMEOUT)
+43 -15
View File
@@ -1,9 +1,9 @@
"""
AZ-178: True streaming video detection — e2e tests.
Both tests upload video_test01.mp4 (12 MB), wait for the first SSE event,
then stop. The goal is to prove the service starts and produces detections,
not to process the whole file.
Both tests upload video_test01.mp4 (12 MB), wait for the first SSE event
to prove streaming starts early, then keep draining SSE until the terminal
event so later tests do not overlap with background video inference.
Run with: pytest e2e/tests/test_streaming_video_upload.py -s -v
"""
@@ -18,7 +18,8 @@ import sseclient
FIXTURES_DIR = Path(__file__).resolve().parent.parent / "fixtures"
_TIMEOUT = 5.0
_STOP_AFTER = 5
_DRAIN_TIMEOUT = 45.0
_VIDEO_CONFIG = json.dumps({"model_batch_size": 1, "frame_period_recognition": 100})
def _fixture_path(name: str) -> str:
@@ -39,10 +40,19 @@ def _chunked_reader(path: str, chunk_size: int = 64 * 1024):
def _start_sse_listener(
http_client, channel_id: str, auth_headers: dict
) -> tuple[list[dict], list[BaseException], threading.Event]:
) -> tuple[
list[dict],
list[BaseException],
threading.Event,
threading.Event,
threading.Event,
threading.Thread,
]:
events: list[dict] = []
errors: list[BaseException] = []
first_event = threading.Event()
terminal_event = threading.Event()
listener_done = threading.Event()
connected = threading.Event()
def _listen():
@@ -50,7 +60,7 @@ def _start_sse_listener(
with http_client.get(
f"/detect/events/{channel_id}",
stream=True,
timeout=_TIMEOUT + 2,
timeout=_DRAIN_TIMEOUT + 5,
headers=auth_headers,
) as resp:
resp.raise_for_status()
@@ -58,8 +68,11 @@ def _start_sse_listener(
for event in sseclient.SSEClient(resp).events():
if not event.data or not str(event.data).strip():
continue
events.append(json.loads(event.data))
if len(events) >= _STOP_AFTER:
data = json.loads(event.data)
events.append(data)
first_event.set()
if data.get("mediaStatus") in ("AIProcessed", "Error"):
terminal_event.set()
first_event.set()
break
except BaseException as exc:
@@ -67,21 +80,24 @@ def _start_sse_listener(
finally:
connected.set()
first_event.set()
listener_done.set()
th = threading.Thread(target=_listen, daemon=True)
th.start()
connected.wait(timeout=3)
return events, errors, first_event
return events, errors, first_event, terminal_event, listener_done, th
@pytest.mark.timeout(10)
@pytest.mark.timeout(60)
def test_streaming_video_detections_appear_during_upload(
warm_engine, http_client, auth_headers
):
# Arrange
video_path = _fixture_path("video_test01.mp4")
channel_id = str(uuid.uuid4())
events, errors, first_event = _start_sse_listener(http_client, channel_id, auth_headers)
events, errors, first_event, terminal_event, listener_done, th = _start_sse_listener(
http_client, channel_id, auth_headers
)
# Act
r = http_client.post(
@@ -91,27 +107,34 @@ def test_streaming_video_detections_appear_during_upload(
**auth_headers,
"X-Channel-Id": channel_id,
"X-Filename": "video_test01.mp4",
"X-Config": _VIDEO_CONFIG,
"Content-Type": "application/octet-stream",
},
timeout=8,
)
assert r.status_code == 202
first_event.wait(timeout=_TIMEOUT)
assert first_event.wait(timeout=_TIMEOUT)
assert terminal_event.wait(timeout=_DRAIN_TIMEOUT)
assert listener_done.wait(timeout=2)
th.join(timeout=2)
# Assert
assert not errors, f"SSE thread error: {errors}"
assert len(events) >= 1, "Expected at least one SSE event within 5s"
assert events[-1].get("mediaStatus") == "AIProcessed"
print(f"\n First {len(events)} SSE events:")
for e in events:
print(f" {e}")
@pytest.mark.timeout(10)
@pytest.mark.timeout(60)
def test_non_faststart_video_still_works(warm_engine, http_client, auth_headers):
# Arrange
video_path = _fixture_path("video_test01.mp4")
channel_id = str(uuid.uuid4())
events, errors, first_event = _start_sse_listener(http_client, channel_id, auth_headers)
events, errors, first_event, terminal_event, listener_done, th = _start_sse_listener(
http_client, channel_id, auth_headers
)
# Act
r = http_client.post(
@@ -121,16 +144,21 @@ def test_non_faststart_video_still_works(warm_engine, http_client, auth_headers)
**auth_headers,
"X-Channel-Id": channel_id,
"X-Filename": "video_test01_plain.mp4",
"X-Config": _VIDEO_CONFIG,
"Content-Type": "application/octet-stream",
},
timeout=8,
)
assert r.status_code == 202
first_event.wait(timeout=_TIMEOUT)
assert first_event.wait(timeout=_TIMEOUT)
assert terminal_event.wait(timeout=_DRAIN_TIMEOUT)
assert listener_done.wait(timeout=2)
th.join(timeout=2)
# Assert
assert not errors, f"SSE thread error: {errors}"
assert len(events) >= 1, "Expected at least one SSE event within 5s"
assert events[-1].get("mediaStatus") == "AIProcessed"
print(f"\n First {len(events)} SSE events:")
for e in events:
print(f" {e}")
+9 -1
View File
@@ -3,7 +3,15 @@ import json
import pytest
_TILING_TIMEOUT = 120
_GSD = {"altitude": 400, "focal_length": 24, "sensor_width": 23.5}
_GSD = {
"camera_config": {
"focal_length": 24,
"sensor_width": 23.5,
"current_zoom": 1,
"current_angle": 90,
"current_height": 400,
}
}
_DUP_THRESHOLD = 0.01
+3
View File
@@ -72,6 +72,9 @@ def video_events(warm_engine, http_client, auth_headers):
**auth_headers,
"X-Channel-Id": channel_id,
"X-Filename": "video_test01.mp4",
"X-Config": json.dumps(
{"model_batch_size": 1, "frame_period_recognition": 100}
),
"Content-Type": "application/octet-stream",
},
timeout=15,
+5
View File
@@ -12,6 +12,11 @@ cdef class AIRecognitionConfig:
cdef public int model_batch_size
cdef public bint has_camera_config
cdef public double current_height
cdef public double current_zoom
cdef public double current_angle
cdef public bint has_altitude
cdef public double altitude
cdef public double focal_length
+64 -9
View File
@@ -9,9 +9,12 @@ cdef class AIRecognitionConfig:
tracking_intersection_threshold,
model_batch_size,
big_image_tile_overlap_percent,
camera_config,
altitude,
focal_length,
sensor_width
sensor_width,
current_zoom,
current_angle
):
self.frame_period_recognition = frame_period_recognition
self.frame_recognition_seconds = frame_recognition_seconds
@@ -25,10 +28,15 @@ cdef class AIRecognitionConfig:
self.big_image_tile_overlap_percent = big_image_tile_overlap_percent
self.has_altitude = altitude is not None
self.altitude = 0.0 if altitude is None else float(altitude)
self.focal_length = focal_length
self.sensor_width = sensor_width
self.has_camera_config = camera_config is not None or altitude is not None
self.current_height = 0.0 if altitude is None else float(altitude)
self.current_zoom = float(current_zoom)
self.current_angle = float(current_angle)
self.has_altitude = self.has_camera_config
self.altitude = self.current_height
self.focal_length = float(focal_length)
self.sensor_width = float(sensor_width)
def __str__(self):
return (f'frame_seconds : {self.frame_recognition_seconds}, distance_confidence : {self.tracking_distance_confidence}, '
@@ -37,13 +45,57 @@ cdef class AIRecognitionConfig:
f'frame_period_recognition : {self.frame_period_recognition}, '
f'big_image_tile_overlap_percent: {self.big_image_tile_overlap_percent}, '
f'model_batch_size: {self.model_batch_size}, '
f'altitude: {self.altitude if self.has_altitude else None}, '
f'camera_config: {self.has_camera_config}, '
f'current_height: {self.current_height if self.has_camera_config else None}, '
f'current_zoom: {self.current_zoom}, '
f'current_angle: {self.current_angle}, '
f'focal_length: {self.focal_length}, '
f'sensor_width: {self.sensor_width}'
)
@staticmethod
cdef AIRecognitionConfig from_dict(dict data):
cdef object camera_config = data.get("camera_config", data.get("cameraConfig", None))
if camera_config is not None and not isinstance(camera_config, dict):
camera_config = None
cdef object altitude = data.get("altitude", None)
cdef object focal_length = data.get("focal_length", data.get("focalLength", 24))
cdef object sensor_width = data.get("sensor_width", data.get("sensorWidth", 23.5))
cdef object current_zoom = data.get("current_zoom", data.get("currentZoom", 1))
cdef object current_angle = data.get("current_angle", data.get("currentAngle", 90))
if camera_config is not None:
altitude = camera_config.get(
"current_height",
camera_config.get("currentHeight", camera_config.get("altitude", altitude)),
)
focal_length = camera_config.get(
"focal_length",
camera_config.get("focalLength", focal_length),
)
sensor_width = camera_config.get(
"sensor_width",
camera_config.get("sensorWidth", sensor_width),
)
current_zoom = camera_config.get(
"current_zoom",
camera_config.get("currentZoom", current_zoom),
)
current_angle = camera_config.get(
"current_angle",
camera_config.get("currentAngle", current_angle),
)
if focal_length is None:
focal_length = 24
if sensor_width is None:
sensor_width = 23.5
if current_zoom is None:
current_zoom = 1
if current_angle is None:
current_angle = 90
return AIRecognitionConfig(
data.get("frame_period_recognition", 4),
data.get("frame_recognition_seconds", 2),
@@ -57,7 +109,10 @@ cdef class AIRecognitionConfig:
data.get("big_image_tile_overlap_percent", 20),
data.get("altitude", None),
data.get("focal_length", 24),
data.get("sensor_width", 23.5)
camera_config,
altitude,
focal_length,
sensor_width,
current_zoom,
current_angle
)
+24 -6
View File
@@ -5,6 +5,7 @@ import av
import cv2
import numpy as np
cimport constants_inf
from libc.math cimport sin
from ai_availability_status cimport AIAvailabilityEnum, AIAvailabilityStatus
from annotation cimport Detection, Annotation
@@ -309,25 +310,42 @@ cdef class Inference:
cdef _append_image_frame_entries(self, AIRecognitionConfig ai_config, list all_frame_data, frame, str original_media_name):
cdef double ground_sampling_distance
cdef double angle_radians
cdef double angle_scale
cdef double effective_focal_length
cdef int model_h, model_w
cdef int img_h, img_w
cdef bint has_gsd
model_h, model_w = self.engine.get_input_shape()
img_h, img_w, _ = frame.shape
has_gsd = ai_config.has_altitude and ai_config.focal_length > 0 and ai_config.sensor_width > 0 and img_w > 0
angle_radians = ai_config.current_angle * 3.141592653589793 / 180.0
angle_scale = sin(angle_radians)
effective_focal_length = ai_config.focal_length * ai_config.current_zoom
has_gsd = (
ai_config.has_camera_config
and ai_config.current_height > 0
and effective_focal_length > 0
and ai_config.sensor_width > 0
and angle_scale > 0
and img_w > 0
)
ground_sampling_distance = 0.0
if has_gsd:
ground_sampling_distance = ai_config.sensor_width * ai_config.altitude / (ai_config.focal_length * img_w)
ground_sampling_distance = (
ai_config.sensor_width
* ai_config.current_height
/ (effective_focal_length * img_w * angle_scale)
)
constants_inf.log(<str>f'ground sampling distance: {ground_sampling_distance}')
else:
constants_inf.log(<str>'ground sampling distance: skipped (altitude unavailable)')
constants_inf.log(<str>'ground sampling distance: skipped (camera_config unavailable)')
if img_h <= 1.5 * model_h and img_w <= 1.5 * model_w:
all_frame_data.append((frame, original_media_name, f'{original_media_name}_000000', ground_sampling_distance))
else:
if not has_gsd:
all_frame_data.append((frame, original_media_name, f'{original_media_name}_000000', ground_sampling_distance))
return
tile_size = int(constants_inf.METERS_IN_TILE / ground_sampling_distance)
tile_size = max(1, int(constants_inf.METERS_IN_TILE / ground_sampling_distance))
constants_inf.log(<str> f'calc tile size: {tile_size}')
res = self.split_to_tiles(frame, original_media_name, tile_size, ai_config.big_image_tile_overlap_percent)
for tile_frame, omn, tile_name in res:
@@ -362,8 +380,8 @@ cdef class Inference:
cdef split_to_tiles(self, frame, str media_stem, tile_size, overlap_percent):
constants_inf.log(<str>f'splitting image {media_stem} to tiles...')
img_h, img_w, _ = frame.shape
stride_w = int(tile_size * (1 - overlap_percent / 100))
stride_h = int(tile_size * (1 - overlap_percent / 100))
stride_w = max(1, int(tile_size * (1 - overlap_percent / 100)))
stride_h = max(1, int(tile_size * (1 - overlap_percent / 100)))
results = []
original_media_name = media_stem
+48 -3
View File
@@ -155,6 +155,14 @@ class HealthResponse(BaseModel):
errorMessage: Optional[str] = None
class CameraConfigDto(BaseModel):
focal_length: float = 24
sensor_width: float = 23.5
current_zoom: float = 1
current_angle: float = 90
current_height: Optional[float] = None
class AIConfigDto(BaseModel):
frame_period_recognition: int = 4
frame_recognition_seconds: int = 2
@@ -164,6 +172,7 @@ class AIConfigDto(BaseModel):
tracking_intersection_threshold: float = 0.6
model_batch_size: int = 8
big_image_tile_overlap_percent: int = 20
camera_config: Optional[CameraConfigDto] = None
altitude: Optional[float] = None
focal_length: float = 24
sensor_width: float = 23.5
@@ -218,9 +227,12 @@ _AI_SETTINGS_FIELD_KEYS = (
"BigImageTileOverlapPercent",
),
),
)
_CAMERA_SETTINGS_FIELD_KEYS = (
(
"altitude",
("altitude", "Altitude"),
"current_height",
("current_height", "currentHeight", "CurrentHeight", "altitude", "Altitude"),
),
(
"focal_length",
@@ -230,6 +242,14 @@ _AI_SETTINGS_FIELD_KEYS = (
"sensor_width",
("sensor_width", "sensorWidth", "SensorWidth"),
),
(
"current_zoom",
("current_zoom", "currentZoom", "CurrentZoom"),
),
(
"current_angle",
("current_angle", "currentAngle", "CurrentAngle"),
),
)
@@ -249,6 +269,21 @@ def _merged_annotation_settings_payload(raw: object) -> dict:
if key in merged and merged[key] is not None:
out[snake] = merged[key]
break
camera_source = {}
for key in ("camera_config", "cameraConfig", "cameraSettings"):
value = raw.get(key)
if isinstance(value, dict):
camera_source.update(value)
camera_merged = dict(merged)
camera_merged.update(camera_source)
camera_config = {}
for snake, aliases in _CAMERA_SETTINGS_FIELD_KEYS:
for key in aliases:
if key in camera_merged and camera_merged[key] is not None:
camera_config[snake] = camera_merged[key]
break
if camera_config:
out["camera_config"] = camera_config
return out
@@ -306,7 +341,13 @@ def _resolve_media_for_detect(
cfg.update(_merged_annotation_settings_payload(raw))
if override is not None:
for k, v in override.model_dump(exclude_defaults=True).items():
cfg[k] = v
if k == "camera_config" and isinstance(v, dict):
existing = cfg.get("camera_config")
camera_cfg = dict(existing) if isinstance(existing, dict) else {}
camera_cfg.update(v)
cfg[k] = camera_cfg
else:
cfg[k] = v
media_path = annotations_client.fetch_media_path(media_id, bearer)
if not media_path:
raise HTTPException(
@@ -515,6 +556,8 @@ async def detect_image(
_post_annotation_to_service(token_mgr, content_hash, annotation, dtos)
def run_sync():
if not inf.is_engine_ready:
raise RuntimeError("Detection service unavailable")
inf.run_detect_image(image_bytes, ai_cfg, media_name, on_annotation)
try:
@@ -609,6 +652,8 @@ async def detect_video_upload(
_post_annotation_to_service(token_mgr, mid, annotation, dtos)
def run_inference():
if not inf.is_engine_ready:
raise RuntimeError("Detection service unavailable")
inf.run_detect_video_stream(buffer, ai_cfg, media_name, on_annotation, lambda *_: None)
inference_future = loop.run_in_executor(executor, run_inference)
+37
View File
@@ -1,24 +1,61 @@
def test_ai_config_from_dict_defaults():
# Arrange
from inference import ai_config_from_dict
# Act
cfg = ai_config_from_dict({})
# Assert
assert cfg.model_batch_size == 8
assert cfg.frame_period_recognition == 4
assert cfg.frame_recognition_seconds == 2
assert cfg.has_camera_config is False
assert cfg.has_altitude is False
def test_ai_config_from_dict_altitude_override_sets_flag():
# Arrange
from inference import ai_config_from_dict
# Act
cfg = ai_config_from_dict({"altitude": 400})
# Assert
assert cfg.has_camera_config is True
assert cfg.has_altitude is True
assert cfg.altitude == 400
assert cfg.current_height == 400
def test_ai_config_from_dict_overrides():
# Arrange
from inference import ai_config_from_dict
# Act
cfg = ai_config_from_dict({"model_batch_size": 4, "probability_threshold": 0.5})
# Assert
assert cfg.model_batch_size == 4
assert cfg.probability_threshold == 0.5
def test_ai_config_from_dict_camera_config_sets_physical_filter_fields():
# Arrange
from inference import ai_config_from_dict
# Act
cfg = ai_config_from_dict(
{
"camera_config": {
"focal_length": 35,
"sensor_width": 36,
"current_zoom": 2,
"current_angle": 80,
"current_height": 300,
}
}
)
# Assert
assert cfg.has_camera_config is True
assert cfg.current_height == 300
assert cfg.focal_length == 35
assert cfg.sensor_width == 36
assert cfg.current_zoom == 2
assert cfg.current_angle == 80
+38 -5
View File
@@ -61,7 +61,9 @@ def test_merged_annotation_settings_pascal_case():
# Assert
assert out["frame_period_recognition"] == 5
assert out["probability_threshold"] == 0.4
assert out["altitude"] == 300
assert out["camera_config"]["current_height"] == 300
assert out["camera_config"]["focal_length"] == 35
assert out["camera_config"]["sensor_width"] == 36
def test_merged_annotation_nested_sections():
@@ -76,7 +78,7 @@ def test_merged_annotation_nested_sections():
out = _merged_annotation_settings_payload(raw)
# Assert
assert out["model_batch_size"] == 4
assert out["altitude"] == 100
assert out["camera_config"]["current_height"] == 100
def test_resolve_media_for_detect_uses_api_path_and_defaults_when_api_empty():
@@ -105,7 +107,7 @@ def test_resolve_media_for_detect_override_wins():
mock_ann = MagicMock()
mock_ann.fetch_user_ai_settings.return_value = {
"probabilityThreshold": 0.2,
"altitude": 500,
"camera_config": {"current_height": 500},
}
mock_ann.fetch_media_path.return_value = "/m/v.mp4"
with patch("main.annotations_client", mock_ann):
@@ -113,11 +115,42 @@ def test_resolve_media_for_detect_override_wins():
cfg, path = main._resolve_media_for_detect("vid-1", tm, override)
# Assert
assert cfg["probability_threshold"] == 0.99
assert cfg["altitude"] == 500
assert cfg["camera_config"]["current_height"] == 500
assert path == "/m/v.mp4"
assert "paths" not in cfg
def test_resolve_media_for_detect_merges_camera_config_override():
# Arrange
import main
tm = main.TokenManager(_access_jwt(), "")
override = main.AIConfigDto(
camera_config=main.CameraConfigDto(current_height=500)
)
mock_ann = MagicMock()
mock_ann.fetch_user_ai_settings.return_value = {
"camera_config": {
"focal_length": 35,
"sensor_width": 36,
"current_zoom": 2,
"current_angle": 80,
"current_height": 300,
}
}
mock_ann.fetch_media_path.return_value = "/m/v.mp4"
with patch("main.annotations_client", mock_ann):
# Act
cfg, path = main._resolve_media_for_detect("vid-1", tm, override)
# Assert
assert cfg["camera_config"]["current_height"] == 500
assert cfg["camera_config"]["focal_length"] == 35
assert cfg["camera_config"]["sensor_width"] == 36
assert cfg["camera_config"]["current_zoom"] == 2
assert cfg["camera_config"]["current_angle"] == 80
assert path == "/m/v.mp4"
def test_resolve_media_for_detect_omits_altitude_when_not_provided():
# Arrange
import main
@@ -130,7 +163,7 @@ def test_resolve_media_for_detect_omits_altitude_when_not_provided():
# Act
cfg, path = main._resolve_media_for_detect("vid-2", tm, None)
# Assert
assert "altitude" not in cfg
assert "camera_config" not in cfg
assert cfg["probability_threshold"] == 0.2
assert path == "/m/v.mp4"