Files
gps-denied-onboard/docs/03_tests/41_performance_single_image_spec.md
T
Oleksandr Bezdieniezhnykh 4f8c18a066 add tests
gen_tests updated
solution.md updated
2025-11-24 22:57:46 +02:00

4.6 KiB

Acceptance Test: Single Image Processing Performance (<5 seconds)

Summary

Validate AC-7 requirement that each individual image processes in less than 5 seconds on target hardware (NVIDIA RTX 2060/3070).

Linked Acceptance Criteria

AC-7: Less than 5 seconds for processing one image.

Preconditions

  • ASTRAL-Next system deployed on target hardware
  • Hardware: NVIDIA RTX 2060 (minimum) or RTX 3070 (recommended)
  • TensorRT FP16 optimized models loaded
  • System warmed up (first frame excluded from timing)
  • No other GPU-intensive processes running

Test Data

  • Test Images: AD000001-AD000010 (representative sample)
  • Scenarios:
    • Easy: Normal overlap, clear features (AD000001-002)
    • Medium: Agricultural texture (AD000005-006)
    • Hard: Minimal overlap (AD000032-033)

Performance Breakdown Target

L1 SuperPoint+LightGlue: 50-150ms
L2 AnyLoc (keyframes only): 150-200ms
L3 LiteSAM (keyframes only): 60-100ms
Factor Graph Update: 50-100ms
Overhead (I/O, coordination): 50-100ms
-----------------------------------
Total (L1 only): <500ms
Total (L1+L2+L3): <700ms
Safety Margin: 4300ms
AC-7 Limit: 5000ms

Test Steps

Step 1: Measure Easy Scenario Performance

Action: Process AD000001 → AD000002 (normal overlap, clear features) Expected Result:

Image Load: <50ms
L1 Processing: 80-120ms
Factor Graph: 30-50ms
Result Output: <20ms
---
Total: <240ms
Status: WELL_UNDER_LIMIT (4.8% of budget)

Step 2: Measure Medium Scenario Performance

Action: Process AD000005 → AD000006 (agricultural texture) Expected Result:

Image Load: <50ms
L1 Processing: 100-150ms (more features)
Factor Graph: 40-60ms
Result Output: <20ms
---
Total: <280ms
Status: UNDER_LIMIT (5.6% of budget)

Step 3: Measure Hard Scenario Performance

Action: Process AD000032 → AD000033 (220.6m jump, minimal overlap) Expected Result:

Image Load: <50ms
L1 Processing: 150-200ms (adaptive depth)
L1 Confidence: LOW → Triggers L2
L2 Processing: 150-200ms
L3 Refinement: 80-120ms
Factor Graph: 80-120ms (more complex)
Result Output: <30ms
---
Total: <720ms
Status: UNDER_LIMIT (14.4% of budget)

Step 4: Measure Worst-Case Performance

Action: Process with all layers active + large factor graph Expected Result:

Image Load: 80ms
L1 Processing: 200ms
L2 Processing: 200ms
L3 Refinement: 120ms
Factor Graph: 150ms (200+ nodes)
Result Output: 50ms
---
Total: <800ms
Status: UNDER_LIMIT (16% of budget)

Step 5: Statistical Performance Analysis

Action: Process 10 representative images, calculate statistics Expected Result:

Mean processing time: 350ms
Median processing time: 280ms
90th percentile: 500ms
95th percentile: 650ms
99th percentile: 800ms
Max: <900ms
All: <5000ms (AC-7 requirement)
Status: PASS

Step 6: Verify TensorRT Optimization Impact

Action: Compare TensorRT FP16 vs PyTorch FP32 performance Expected Result:

PyTorch FP32 (baseline): 800-1200ms per image
TensorRT FP16 (optimized): 250-400ms per image
Speedup: 2.5-3.5x
Without TensorRT: Would fail AC-7
With TensorRT: Comfortably passes AC-7

Pass/Fail Criteria

PASS if:

  • 100% of images process in <5000ms
  • Mean processing time <1000ms (20% of budget)
  • 99th percentile <2000ms (40% of budget)
  • TensorRT FP16 optimization active and verified
  • Performance consistent across easy/medium/hard scenarios

FAIL if:

  • ANY image takes ≥5000ms
  • Mean processing time >2000ms
  • System cannot maintain <5s with TensorRT optimization
  • Performance degrades over time (memory leak)

Hardware Requirements Validation

RTX 2060 (Minimum)

  • VRAM: 6GB
  • Expected performance: 90th percentile <1000ms
  • Status: Meets AC-7 with optimization
  • VRAM: 8GB
  • Expected performance: 90th percentile <700ms
  • Status: Comfortably exceeds AC-7

Performance Optimization Checklist

  • TensorRT FP16 models compiled and loaded
  • CUDA graphs enabled for inference
  • Batch size = 1 (real-time constraint)
  • Asynchronous GPU operations where possible
  • Memory pre-allocated (no runtime allocation)
  • Factor graph incremental updates (iSAM2)

Monitoring and Profiling

  • NVIDIA Nsight: GPU utilization >80% during processing
  • CPU Usage: <50% (GPU-bound workload)
  • Memory: Stable (no leaks over 100+ images)
  • Thermal: GPU <85°C sustained

Notes

  • AC-7 specifies "processing one image", interpreted as latency per image
  • 5-second budget is generous given target ~500ms actual performance
  • Margin allows for:
    • Older hardware (RTX 2060)
    • Complex scenarios (multiple layers active)
    • Factor graph growth over long flights
    • System overhead
  • Real-time (<100ms) not required; <5s is operational target