Files
gps-denied-onboard/_docs/02_document/tests/performance-tests.md
T
Oleksandr Bezdieniezhnykh cab7b5d020 [AZ-233] Update Docker Compose and enhance test documentation
- Modified the Docker Compose configuration to include an input root for replay tests and added an environment variable for enabling SITL.
- Enhanced documentation for various testing processes, including the addition of a Runtime Completeness Decomposition Gate and clarifications on internal module testing requirements.
- Updated the implementation completeness report to reflect the current state and added new test cases for performance and resilience scenarios.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-06 05:03:48 +03:00

118 lines
5.1 KiB
Markdown

# Performance Tests
### NFT-PERF-01: Per-Frame Latency On Project Still Images
**Summary**: Validate end-to-end latency for processing project nadir frames through geolocation output.
**Traces to**: AC-4.1, AC-4.4
**Metric**: Capture-to-output latency p50/p95/p99 and dropped-frame rate.
**Preconditions**:
- Jetson Orin Nano Super or equivalent production target is running in the intended power mode.
- `project_60_still_images` fixture is available.
| Step | Consumer Action | Measurement |
|------|-----------------|-------------|
| 1 | Replay images at target 3 fps or faster stress rate | Measure latency from input timestamp to emitted estimate |
| 2 | Record all frame drops | Measure dropped-frame percentage |
**Pass criteria**: p95 latency <400 ms; dropped frames <=10% under sustained load; no batching delay.
**Duration**: Minimum 20 minutes or full fixture loop repeated enough times to reach stable measurements.
---
### NFT-PERF-02: BASALT + Wrapper Replay Latency
**Summary**: Validate relative VIO hot-path latency using synchronized Derkachi video/telemetry and public or representative camera/IMU data.
**Traces to**: AC-2.1a, AC-4.1, AC-4.2
**Metric**: Per-frame VIO latency, completion rate, and memory usage.
**Preconditions**:
- Derkachi `flight_derkachi.mp4` and `data_imu.csv` are mounted and pass fixture validation.
- MUN-FRL/ALTO/EPFL/Kagaru or another representative synchronized dataset slice is pinned for calibrated final comparison.
- OpenVINS reference replay is available for comparison when the dataset supports it.
| Step | Consumer Action | Measurement |
|------|-----------------|-------------|
| 1 | Replay Derkachi video at target 3 fps and stress rates from the 30 fps source | Measure per-frame processing time, dropped frames, and telemetry alignment |
| 2 | Replay synchronized camera/IMU stream through BASALT + wrapper | Measure VIO processing time and completion rate |
| 3 | Compare emitted trajectory against Derkachi `GLOBAL_POSITION_INT` and calibrated dataset ground truth where available | Measure completion rate and error distribution |
| 4 | Monitor memory | Track CPU/GPU shared memory peak |
**Pass criteria**: Normal-frame VO registration >95% on calibration-supported segments; p95 processing latency <400 ms for the hot path; memory <8 GB shared; Derkachi replay maintains stable 3-video-frames-per-telemetry-row alignment with <=10% dropped frames under sustained target-rate replay.
**Duration**: Dataset-dependent; at least one normal segment and one challenging segment.
---
### NFT-PERF-03: Relocalization Trigger Path Latency
**Summary**: Validate the heavy DINOv2-VLAD + FAISS + ALIKED/LightGlue path under bounded top-K settings.
**Traces to**: AC-3.2, AC-3.3, AC-4.1, AC-8.6
**Metric**: Trigger-to-anchor latency, top-K query time, local verification time, accepted/rejected anchor counts.
**Preconditions**:
- Precomputed descriptor index is loaded.
- Dynamic K settings are configured: K=5 stable, K=20 active-conflict, K=50 fallback.
| Step | Consumer Action | Measurement |
|------|-----------------|-------------|
| 1 | Trigger relocalization from cold start or sharp turn | Measure DINOv2 descriptor time and FAISS query time |
| 2 | Verify top-K candidates | Measure ALIKED/LightGlue + RANSAC latency |
| 3 | Emit accepted/rejected decision | Measure total trigger-to-decision latency |
**Pass criteria**: Heavy path is conditional, never blocks steady-state frame output; accepted anchor carries MRE <2.5 px and valid covariance.
**Duration**: 100 relocalization trials across stable and active-conflict sector fixtures.
---
### NFT-PERF-04: Cold Boot Time To First Fix
**Summary**: Validate companion boot to first valid `GPS_INPUT`.
**Traces to**: AC-NEW-1
**Metric**: Time from service start/boot marker to first valid `GPS_INPUT`.
**Preconditions**:
- Engines/indexes are built before the run.
- Cache/index is available locally.
- FC state handoff is simulated or provided.
| Step | Consumer Action | Measurement |
|------|-----------------|-------------|
| 1 | Start service from cold boot condition | Measure initialization stages |
| 2 | Wait for first valid output | Measure first valid `GPS_INPUT` timestamp |
**Pass criteria**: 95th percentile <30 s over 50 runs.
**Duration**: 50 cold-start trials.
---
### NFT-PERF-INFRA: Replay Evidence Smoke
**Summary**: Validate that the Docker replay harness records timing evidence for the runnable local replay subset.
**Traces to**: AZ-234 AC-3, AZ-233 AC-3, AZ-233 AC-4
**Metric**: Scenario execution time and report generation status.
**Preconditions**:
- Docker replay environment is available.
- Project input fixtures are mounted read-only into the replay consumer.
| Step | Consumer Action | Measurement |
|------|-----------------|-------------|
| 1 | Run the replay consumer in Docker mode | Confirm the performance smoke scenario executes |
| 2 | Inspect the generated CSV and FDR summary | Confirm execution time and artifact paths are recorded |
**Pass criteria**: `NFT-PERF-INFRA` reports `pass` and writes run-scoped CSV/Markdown evidence; Jetson-only performance evidence remains in release-gate resource tests.