Files
gps-denied-onboard/_docs/02_document/tests/performance-tests.md
T

249 lines
9.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Performance Tests
> Deployment-binding numbers require Tier T4 (real Jetson Orin Nano Super @ 25 W). T1 runs are functional plausibility checks only — same caveat as `test-data.md` D2.
---
### NFT-PERF-01: End-to-end latency p95 ≤400 ms (AC-4.1)
**Summary**: From camera-frame capture to GPS_INPUT emission, p95 latency ≤ 400 ms on Orin Nano Super @ 25 W.
**Traces to**: AC-4.1. Tier: T4 (`deferred-hil`) for binding result; T1 functional smoke.
**Metric**: end-to-end latency in ms, sampled per-frame, aggregated to p50 / p95 / p99.
**Preconditions**:
- Tier T4: real Jetson Orin Nano Super, 25 W power mode (`nvpmodel -m 0` + 25 W profile), thermals stabilized at +25 °C ambient.
- TRT engines warmed (≥1 min steady-state replay before measurement).
- 30-min sustained replay of `synthetic_8h_load` slice (or AerialVL S03 mid-segment).
- Frame timestamping uses the camera-shim `time_usec` and matches against the GPS_INPUT `time_usec`.
**Steps**:
| Step | Consumer Action | Measurement |
|------|----------------|-------------|
| 1 | Stream nav-cam frames at 3 fps for 30 min after warm-up | per-frame `(t_emit_gps_input - t_capture)` |
| 2 | Drop the first 60 s as warm-up | aggregate the rest |
| 3 | Compute p50, p95, p99, max | report |
| 4 | Verify drop rate | `dropped_frames / total_frames ≤ 10%` |
**Pass criteria**: p95 ≤ 400 ms; drop rate ≤ 10 % (per AC-4.1's "skip-allowed" clause).
**Duration**: 30 min + 60 s warm-up.
---
### NFT-PERF-02: cuVSLAM single-frame latency ≤20 ms
**Summary**: cuVSLAM inference completes within 20 ms per frame.
**Traces to**: results_report row 37, F-T1b. Tier: T4 binding; T1 functional.
**Metric**: cuVSLAM per-frame inference duration, p95.
**Preconditions**: cuVSLAM warmed; mono+IMU mode.
**Steps**:
| Step | Consumer Action | Measurement |
|------|----------------|-------------|
| 1 | Replay 5 min of nav-cam frames at 3 fps | per-frame `cuvslam_inference_ms` (publicly exposed metric) |
| 2 | p95 over the run | report |
**Pass criteria**: p95 ≤ 20 ms.
**Duration**: 5 min.
---
### NFT-PERF-03: Cross-view matcher latency
**Summary**: Inline matcher (SP+LG TRT FP16/INT8) ≤ 200 ms / pair; LiteSAM re-loc fallback ≤ 2000 ms / pair.
**Traces to**: AC-4.1 (sub-budget), results_report row 38. Tier: T4 binding.
**Metric**: per-pair matcher inference time, p95.
**Preconditions**: matcher warmed; representative resolution (1024×768 SP+LG / GIM-LG).
**Steps**:
| Step | Consumer Action | Measurement |
|------|----------------|-------------|
| 1 | Replay 1000 cross-view pairs through inline path | `inline_matcher_ms` per pair |
| 2 | Replay 100 cross-view pairs through re-loc path | `reloc_matcher_ms` per pair |
**Pass criteria**: inline p95 ≤ 200 ms; re-loc p95 ≤ 2000 ms.
**Duration**: ≤30 min.
---
### NFT-PERF-04: Orthority per-frame latency ≤50 ms
**Summary**: Orthority's per-frame ortho call on Orin Nano Super stays within budget.
**Traces to**: F-T14, M-27. Tier: T4 binding. If exceeded, fall back to `cv2.warpPerspective + bilinear DEM` per Component 1b documented fall-back.
**Metric**: ortho per-frame duration, p95.
**Preconditions**: Orthority loaded; SRTM-30 m DEM mmap warm; sector classified `flat` or `moderate`.
**Steps**:
| Step | Consumer Action | Measurement |
|------|----------------|-------------|
| 1 | Replay 1000 frames | per-frame `ortho_ms` |
**Pass criteria**: p95 ≤ 50 ms. If FAIL: open task to switch to fall-back path (not a blocking gate at this test, but a flow trigger).
**Duration**: ≤10 min.
---
### NFT-PERF-05: Spoofing-promotion latency ≤3 s p95 (AC-NEW-2)
**Summary**: Time from spoof onset to SUT promotion as primary GPS source.
**Traces to**: AC-NEW-2. Tier: T3 (`deferred-sitl`).
**Metric**: t_promote = `t_promotion_event - t_spoof_onset`, p95 over 50 trials.
**Preconditions**: SITL + `gps-spoof-injector`; FC EKF3 lane-switch event observable via `EKF_STATUS_REPORT`.
**Steps**:
| Step | Consumer Action | Measurement |
|------|----------------|-------------|
| 1 | At t=0 inject spoof signal | observe SUT GPS_INPUT promotion (raised `fix_type` to 3D-fix-with-priority + STATUSTEXT `PROMOTE`) |
| 2 | Repeat 50 trials with randomised spoof magnitudes | distribution |
**Pass criteria**: p95 ≤ 3 s.
**Duration**: ≤30 min.
---
### NFT-PERF-06: Frame-by-frame output cadence (AC-4.4)
**Summary**: GPS_INPUT is streamed per-frame, not batched.
**Traces to**: AC-4.4. Tier: T1 + T4.
**Metric**: inter-frame interval distribution.
**Preconditions**: 30 min steady-state replay.
**Steps**:
| Step | Consumer Action | Measurement |
|------|----------------|-------------|
| 1 | Replay at 3 fps | sniff GPS_INPUT timestamps |
| 2 | Compute inter-arrival deltas | distribution |
| 3 | Verify no frame is delayed >1 inter-frame interval | — |
**Pass criteria**: |Δt - 1/3 s| ≤ 50 ms for ≥99 % of frames; no batches (no clusters of frames within the same 50 ms window).
**Duration**: 30 min.
---
### NFT-PERF-07: GPS_INPUT message rate (results_report row 9)
**Summary**: GPS_INPUT emitted at 510 Hz continuous (matches per-frame at 3 fps + duplicates for FC stability when configured).
**Traces to**: AC-4.3, results_report row 9. Tier: T1.
**Metric**: rate over 60 s windows.
**Preconditions**: steady-state tracking.
**Steps**:
| Step | Consumer Action | Measurement |
|------|----------------|-------------|
| 1 | Sniff GPS_INPUT for 5 min | per-second rate |
**Pass criteria**: rate ∈ [5, 10] Hz throughout.
**Duration**: 5 min.
---
### NFT-PERF-08: VPR latency under conditional invocation
**Summary**: VPR's DINOv2 forward only fires on re-loc triggers; in cruise it stays near zero CPU/GPU.
**Traces to**: AC-8.6, restrictions §Satellite (VPR retrieval unit). Tier: T4.
**Metric**: VPR invocations / second; cruise idle vs re-loc burst.
**Preconditions**: 60-min replay with scripted re-loc triggers (cold start, sharp turn, σ_xy > 50 m, VO failure ≥2 frames).
**Steps**:
| Step | Consumer Action | Measurement |
|------|----------------|-------------|
| 1 | Run replay | per-second `vpr_invocations` counter |
| 2 | Compute average across cruise window vs re-loc window | — |
**Pass criteria**:
- Cruise window (no triggers): VPR invocations / 100 frames ≤ 1 (i.e., not invoked per-frame).
- Re-loc window: VPR invokes within 1 frame of trigger; latency ≤ 200 ms p95 for the DINOv2 forward.
**Duration**: 60 min.
---
### NFT-PERF-09: Top-K dynamic sizing matches sector / σ_xy
**Summary**: VPR top-K honours AC-8.6 dynamic-K rules.
**Traces to**: AC-8.6. Tier: T1 + T4.
**Metric**: K value selected per VPR call vs sector class + σ_xy.
**Preconditions**: scripted scenarios with (sector ∈ {stable, active}) × (σ_xy ∈ {10, 30, 60}).
**Steps**:
| Step | Consumer Action | Measurement |
|------|----------------|-------------|
| 1 | Trigger VPR in each combination | observe `vpr_top_k` metric |
**Pass criteria**:
- stable + σ_xy ≤ 20 m → K=5.
- active-conflict → K=20.
- expanding-window fallback (σ_xy > 50 m or fail-N) → K=50.
**Duration**: 5 min.
---
### NFT-PERF-10: Failsafe latency ≤3 s no-fix → FC fallback (AC-5.2)
**Summary**: When SUT cannot produce any estimate for >3 s, FC observably falls back to IMU-only DR.
**Traces to**: AC-5.2. Tier: T3.
**Metric**: time from last-fix-emission to FC fallback signal in `EKF_STATUS_REPORT`.
**Preconditions**: scripted blackout in SITL.
**Steps**: blackout pipeline; observe FC.
**Pass criteria**: FC fallback observable within 4 s of blackout (3 s budget + 1 s observation latency).
**Duration**: 5 min.
---
### NFT-PERF-11: Bench-off candidates — accuracy-vs-latency frontier
**Summary**: Score inline matcher candidates on the documented bench-off corpora.
**Traces to**: AC-1.1 / AC-1.2 / AC-2.2 / R2 / R3, F-T15. Tier: T2.
**Metric**: per-candidate (recall@30 m, p95 latency, peak GPU mem, sustained 30-min thermal stability, seasonal-robustness score).
**Preconditions**: AerialVL, UAV-VisLoc, AerialExtreMatch, 2chADCNN, TartanAir V2, internal Mavic.
**Steps**: run each candidate (SP+LG, GIM-LG, XFeat sparse, XFeat semi-dense) and each ceiling reference (RoMa v2, MASt3R-SLAM, MapGlue, MATCHA — offline only) over the corpora.
**Pass criteria**:
- Inline candidates must fit in 200 ms / pair on Orin Nano Super @ 25 W.
- Re-loc candidates (LiteSAM) must fit in 2 s / pair.
- Selected inline matcher's recall@30 m on AerialVL S03 must support AC-1.1 / AC-1.2.
**Duration**: 4 h Monte Carlo.
---
### NFT-PERF-12: Latency under adversarial input — no infinite stall
**Summary**: Pathological inputs (uniform-grey frame, all-black frame, very low contrast) do not cause unbounded latency.
**Traces to**: AC-3.x (resilience), AC-4.1 (negative). Tier: T1.
**Metric**: per-frame latency capped.
**Preconditions**: replay with 5 % of frames replaced by uniform-grey or all-black.
**Steps**: replay 30 min; observe latency CDF.
**Pass criteria**: each frame's latency ≤ 600 ms (1.5× p95 budget); pipeline never stalls beyond a single frame interval.
**Duration**: 30 min.
---
## Test execution caveats
- **T1 runs**: produced numbers are NOT deployment-binding. AC-4.1 / NFT-PERF-01 specifically requires Orin Nano Super 25 W (T4) for binding pass.
- **T4 runs**: bench scheduler enforces single-tenant access; thermal warm-up ≥1 min before measurement window starts.
- **Frame-rate floor**: AC-4.1 allows ~10 % drop under sustained load. Drop rate IS measured and reported in NFT-PERF-01.