Files
gps-denied-onboard/docs/03_tests/42_performance_sustained_throughput_spec.md
T
Oleksandr Bezdieniezhnykh 4f8c18a066 add tests
gen_tests updated
solution.md updated
2025-11-24 22:57:46 +02:00

5.5 KiB

Acceptance Test: Sustained Performance Throughput

Summary

Validate that AC-7 performance (<5s per image) is maintained throughout long flight sequences without degradation from memory growth, thermal throttling, or resource exhaustion.

Linked Acceptance Criteria

AC-7: Less than 5 seconds for processing one image (sustained over entire flight).

Preconditions

  • ASTRAL-Next system deployed on target hardware
  • Hardware: NVIDIA RTX 3070 (or RTX 2060 minimum)
  • Long flight test dataset available
  • System monitoring tools active (GPU-Z, NVIDIA SMI, memory profiler)

Test Data

  • Short Flight: AD000001-AD000030 (30 images, ~3.6km flight)
  • Full Flight: AD000001-AD000060 (60 images, ~7.2km flight)
  • Simulated Long Flight: AD000001-060 repeated 10x (600 images, ~72km)
  • Maximum Scale: 1500 images (target operational max from requirements)

Test Steps

Step 1: Baseline Short Flight Performance

Action: Process 30-image flight, measure per-image timing Expected Result:

Images: 30
Mean time: <500ms per image
Total time: <15s
Memory growth: <100MB
GPU temp: <75°C
Status: BASELINE_ESTABLISHED

Step 2: Full Dataset Performance

Action: Process 60-image flight, monitor for degradation Expected Result:

Images: 60
Mean time: <550ms per image (slight increase acceptable)
Total time: <33s
Memory growth: <200MB (factor graph growth)
GPU temp: <78°C
No thermal throttling: True
Status: FULL_FLIGHT_STABLE

Step 3: Extended Flight Simulation

Action: Process 600 images (10x repeat of dataset) Expected Result:

Images: 600
Mean time per image: <600ms (monitoring degradation)
90th percentile: <800ms
99th percentile: <1200ms
Max: <2000ms
All images: <5000ms (AC-7 maintained)
Total time: <6 minutes
Status: EXTENDED_FLIGHT_STABLE

Step 4: Memory Stability Analysis

Action: Monitor memory usage over extended flight Expected Result:

Initial memory: 2-3GB (models loaded)
After 100 images: 3-4GB (factor graph)
After 300 images: 4-5GB (larger graph)
After 600 images: 5-6GB (stable, not growing unbounded)
Memory leaks: None detected
Factor graph pruning: Active (old nodes marginalized)
Status: MEMORY_STABLE

Step 5: Thermal Management

Action: Monitor GPU temperature over extended processing Expected Result:

Idle: 40-50°C
Initial processing: 65-70°C
After 15 minutes: 70-75°C
After 30 minutes: 72-78°C (stable)
Max temperature: <85°C (throttling threshold)
Thermal throttling events: 0
Fan speed: 60-80% (adequate cooling)
Status: THERMAL_STABLE

Step 6: Performance Degradation Analysis

Action: Compare first 60 images vs last 60 images in 600-image flight Expected Result:

First 60 images:
  Mean: 450ms
  90th percentile: 600ms
  
Last 60 images:
  Mean: 550ms (acceptable increase)
  90th percentile: 700ms
  
Degradation: <20% (within acceptable limits)
Root cause: Larger factor graph (expected)
Mitigation: iSAM2 incremental updates working
Status: ACCEPTABLE_DEGRADATION

Step 7: Maximum Scale Test (1500 images)

Action: Process or simulate 1500-image flight (operational maximum) Expected Result:

Images: 1500
Processing time: <15 minutes total
Mean per image: <600ms
All images: <5000ms (AC-7 requirement met)
Memory: <8GB (within RTX 3070 limits)
System stability: No crashes, hangs, or errors
Status: MAX_SCALE_PASS

Pass/Fail Criteria

PASS if:

  • ALL images across all test scales process in <5000ms
  • Performance degradation <30% over 600+ images
  • No memory leaks detected (unbounded growth)
  • No thermal throttling occurs
  • System remains stable through 1500+ image flights
  • Mean processing time <1000ms across all scales

FAIL if:

  • ANY image takes ≥5000ms at any point in flight
  • Performance degradation >50% over extended flight
  • Memory leaks detected (>10GB for 600 images)
  • Thermal throttling reduces performance
  • System crashes on long flights
  • Mean processing time >2000ms

Performance Optimization for Sustained Throughput

Factor Graph Pruning

  • Marginalize old nodes after 100 frames
  • Keep only last 50 frames in active optimization
  • Reduces computational complexity: O(n²) → O(1)

Incremental Optimization (iSAM2)

  • Avoid full graph re-optimization each frame
  • Update only affected nodes
  • 10-20x speedup vs full batch optimization

Memory Management

  • Pre-allocate buffers for images and features
  • Release satellite tile cache for old regions
  • GPU memory defragmentation every 100 frames

Thermal Management

  • Monitor GPU temperature continuously
  • Reduce batch size if approaching thermal limits
  • Optional: reduce processing rate to maintain <80°C

Monitoring Dashboard Metrics

Current FPS: 2-3 images/second
Current Latency: 350-500ms per image
Images Processed: 450/1500
Estimated Completion: 6 minutes
GPU Utilization: 85%
GPU Memory: 5.2GB / 8GB
GPU Temperature: 74°C
CPU Usage: 40%
System Memory: 12GB / 32GB
Status: HEALTHY

Failure Modes Tested

  • Memory Leak: Unbounded factor graph growth
  • Thermal Throttling: GPU overheating reduces performance
  • Cache Thrashing: Satellite tile cache inefficiency
  • Optimization Slowdown: Factor graph becomes too large
  • Resource Exhaustion: Running out of GPU memory

Notes

  • Sustained performance more demanding than single-image performance
  • Real operational flights 500-1500 images (requirement specification)
  • 5-second budget per image allows comfortable sustained operation
  • System architecture designed for long-term stability
  • iSAM2 incremental optimization is critical for scalability