# Acceptance Test: Single Image Processing Performance (<5 seconds) ## Summary Validate AC-7 requirement that each individual image processes in less than 5 seconds on target hardware (NVIDIA RTX 2060/3070). ## Linked Acceptance Criteria **AC-7**: Less than 5 seconds for processing one image. ## Preconditions - ASTRAL-Next system deployed on target hardware - Hardware: NVIDIA RTX 2060 (minimum) or RTX 3070 (recommended) - TensorRT FP16 optimized models loaded - System warmed up (first frame excluded from timing) - No other GPU-intensive processes running ## Test Data - **Test Images**: AD000001-AD000010 (representative sample) - **Scenarios**: - Easy: Normal overlap, clear features (AD000001-002) - Medium: Agricultural texture (AD000005-006) - Hard: Minimal overlap (AD000032-033) ## Performance Breakdown Target ``` L1 SuperPoint+LightGlue: 50-150ms L2 AnyLoc (keyframes only): 150-200ms L3 LiteSAM (keyframes only): 60-100ms Factor Graph Update: 50-100ms Overhead (I/O, coordination): 50-100ms ----------------------------------- Total (L1 only): <500ms Total (L1+L2+L3): <700ms Safety Margin: 4300ms AC-7 Limit: 5000ms ``` ## Test Steps ### Step 1: Measure Easy Scenario Performance **Action**: Process AD000001 → AD000002 (normal overlap, clear features) **Expected Result**: ``` Image Load: <50ms L1 Processing: 80-120ms Factor Graph: 30-50ms Result Output: <20ms --- Total: <240ms Status: WELL_UNDER_LIMIT (4.8% of budget) ``` ### Step 2: Measure Medium Scenario Performance **Action**: Process AD000005 → AD000006 (agricultural texture) **Expected Result**: ``` Image Load: <50ms L1 Processing: 100-150ms (more features) Factor Graph: 40-60ms Result Output: <20ms --- Total: <280ms Status: UNDER_LIMIT (5.6% of budget) ``` ### Step 3: Measure Hard Scenario Performance **Action**: Process AD000032 → AD000033 (220.6m jump, minimal overlap) **Expected Result**: ``` Image Load: <50ms L1 Processing: 150-200ms (adaptive depth) L1 Confidence: LOW → Triggers L2 L2 Processing: 150-200ms L3 Refinement: 80-120ms Factor Graph: 80-120ms (more complex) Result Output: <30ms --- Total: <720ms Status: UNDER_LIMIT (14.4% of budget) ``` ### Step 4: Measure Worst-Case Performance **Action**: Process with all layers active + large factor graph **Expected Result**: ``` Image Load: 80ms L1 Processing: 200ms L2 Processing: 200ms L3 Refinement: 120ms Factor Graph: 150ms (200+ nodes) Result Output: 50ms --- Total: <800ms Status: UNDER_LIMIT (16% of budget) ``` ### Step 5: Statistical Performance Analysis **Action**: Process 10 representative images, calculate statistics **Expected Result**: ``` Mean processing time: 350ms Median processing time: 280ms 90th percentile: 500ms 95th percentile: 650ms 99th percentile: 800ms Max: <900ms All: <5000ms (AC-7 requirement) Status: PASS ``` ### Step 6: Verify TensorRT Optimization Impact **Action**: Compare TensorRT FP16 vs PyTorch FP32 performance **Expected Result**: ``` PyTorch FP32 (baseline): 800-1200ms per image TensorRT FP16 (optimized): 250-400ms per image Speedup: 2.5-3.5x Without TensorRT: Would fail AC-7 With TensorRT: Comfortably passes AC-7 ``` ## Pass/Fail Criteria **PASS if**: - 100% of images process in <5000ms - Mean processing time <1000ms (20% of budget) - 99th percentile <2000ms (40% of budget) - TensorRT FP16 optimization active and verified - Performance consistent across easy/medium/hard scenarios **FAIL if**: - ANY image takes ≥5000ms - Mean processing time >2000ms - System cannot maintain <5s with TensorRT optimization - Performance degrades over time (memory leak) ## Hardware Requirements Validation ### RTX 2060 (Minimum) - VRAM: 6GB - Expected performance: 90th percentile <1000ms - Status: Meets AC-7 with optimization ### RTX 3070 (Recommended) - VRAM: 8GB - Expected performance: 90th percentile <700ms - Status: Comfortably exceeds AC-7 ## Performance Optimization Checklist - TensorRT FP16 models compiled and loaded - CUDA graphs enabled for inference - Batch size = 1 (real-time constraint) - Asynchronous GPU operations where possible - Memory pre-allocated (no runtime allocation) - Factor graph incremental updates (iSAM2) ## Monitoring and Profiling - **NVIDIA Nsight**: GPU utilization >80% during processing - **CPU Usage**: <50% (GPU-bound workload) - **Memory**: Stable (no leaks over 100+ images) - **Thermal**: GPU <85°C sustained ## Notes - AC-7 specifies "processing one image", interpreted as latency per image - 5-second budget is generous given target ~500ms actual performance - Margin allows for: - Older hardware (RTX 2060) - Complex scenarios (multiple layers active) - Factor graph growth over long flights - System overhead - Real-time (<100ms) not required; <5s is operational target