detections-semantic/_docs/00_problem/acceptance_criteria.md

# Latency
- Tier 1 (fast probe / YOLO26 + YOLOE-26 detection): ≤100ms per frame on Jetson Orin Nano Super
- Tier 2 (fast confirmation / custom CNN classifier): ≤200ms per region of interest on Jetson Orin Nano Super
- Tier 3 (optional deep analysis / VLM): ≤5 seconds per region of interest on Jetson Orin Nano Super

# YOLO Object Detection — New Classes
- New classes added to existing YOLO model: black entrances (various sizes), piles of tree branches, footpaths, roads, trees, tree blocks
- Detection performance for new classes: targets matching existing YOLO baseline (P≥80%, R≥80%) on validation set
- New classes must not degrade detection performance of existing classes

# Semantic Detection Performance (Initial Release)
- Recall on concealed positions: ≥60% (start high on false positives, iterate down)
- Precision on concealed positions: ≥20% initial target (operator filters candidates)
- Footpath detection recall: ≥70%
- Baseline reference: existing YOLO achieves P=81.6%, R=85.2% on non-masked objects

# Scan Algorithm
- Level 1 wide-area scan covers the planned route with left-right camera sweep at medium zoom
- Points of interest detected during Level 1: footpaths, tree rows, branch piles, black entrances, houses with vehicles/traces, roads on snow/terrain/forest
- Transition from Level 1 to Level 2 within 2 seconds of POI detection (includes physical zoom transition)
- Level 2 maintains camera lock on POI while UAV continues flight (gimbal compensates for aircraft motion)
- Path-following mode: camera pans along detected footpath at a rate that keeps the path visible and centered
- Endpoint hold: camera maintains position on path endpoint for VLM analysis duration (up to 2 seconds)
- Return to Level 1 after analysis completes or configurable timeout (default 5 seconds per POI)

# Camera Control
- Gimbal control module sends pan/tilt/zoom commands to ViewPro A40
- Gimbal command latency: ≤500ms from decision to physical camera movement
- Zoom transitions: medium zoom (Level 1) to high zoom (Level 2) within 2 seconds (physical constraint)
- Path-following accuracy: detected footpath stays within center 50% of frame during pan
- Smooth gimbal transitions (no jerky movements that blur the image)
- Queue management: system maintains ordered list of POIs, prioritized by confidence and proximity to current camera position

# Semantic Analysis Pipeline
- Consumes YOLO detections as input: uses detected footpaths, roads, branch piles, entrances, trees as primitives for reasoning
- Distinguishes fresh footpaths from stale ones (visual freshness assessment)
- Traces footpaths to endpoints and identifies concealed structures at those endpoints
- Handles path intersections by following freshest / most promising branch

# Resource Constraints
- Total RAM usage (semantic module + VLM): ≤6GB on Jetson Orin Nano Super
- Must coexist with running YOLO inference pipeline without degrading YOLO performance

# Data & Training
- Training dataset: hundreds to thousands of annotated examples across all seasons and terrain types
- New YOLO classes require dedicated annotation effort for: black entrances, branch piles, footpaths, roads, trees, tree blocks
- Dataset assembly timeline: 1.5 months, 5 hours/day manual annotation effort available