Files
gps-denied-onboard/_docs/02_document/tests/performance-tests.md
T
Oleksandr Bezdieniezhnykh c19c76481c Update autodev skill documentation and acceptance criteria
Enhanced the SKILL.md file to enforce conciseness rules for the state file, specifying acceptable content and file size limits. Updated the autodev state to reflect the transition to the planning phase, including changes to the current step and sub-step details. Revised acceptance criteria to clarify validation requirements and external dependencies, ensuring alignment with the latest research findings. Added a new overlay for Mode B revisions to track changes and decisions made during the assessment process.
2026-05-09 03:10:57 +03:00

5.9 KiB
Raw Blame History

Performance Tests

All performance tests honor the per-tier execution profile from environment.md. Latency and memory tests bound to Jetson Orin Nano Super hardware run on Tier-2 only; metrics that don't depend on hardware (e.g. inter-emit interval correctness, GCS rate) run on both tiers.

NFT-PERF-01: End-to-end latency p95 budget

Summary: Validates the AC-4.1 end-to-end latency budget (camera capture → GPS to FC) on the pinned hardware. Traces to: AC-4.1, D-CROSS-LATENCY-1 Metric: Wall-clock latency from frame-capture timestamp to outbound GPS_INPUT (AP) / MSP2_SENSOR_GPS (iNav) reception at the SITL container.

Preconditions:

  • Tier-2 only — Jetson Orin Nano Super, JetPack 6.2, TensorRT 10.3 per D-C7-9.
  • tile-cache-fixture pre-loaded.
  • SUT cold-started THEN warmed up for 30 s of replay before measurement window starts.
  • Two configurations measured: (a) K=3 baseline at +25 °C, (b) K=2 + Jacobian-cov hybrid auto-degrade at +50 °C ambient (NFT-9 in the solution draft).

Steps:

Step Consumer Action Measurement
1 Run 30 s warm-up replay (excluded from measurement) none
2 Run 5 min Derkachi replay at 3 Hz target cadence per-frame latency: t_emit_at_sitl t_capture
3 Record per-frame latency to CSV; compute p50, p95, p99 distribution
4 Repeat at +50 °C ambient (chamber if available, else flagged) distribution under thermal-throttle hybrid

Pass criteria:

  • (a) K=3 baseline: p95 ≤ 400 ms (AC-4.1 hard bound).
  • (b) K=2 + Jacobian-cov hybrid: p95 ≤ 400 ms still satisfied after auto-degrade (proves D-CROSS-LATENCY-1 effective).
  • ≤10% frame drops under sustained load (AC-4.1 allowance).
  • Per-stage latency partitioning (D-CROSS-LATENCY-1 table) recorded for all stages: C1 OKVIS2 / C2 UltraVPR / C2.5 / C3 / C3.5 / C4 / C4 cov / C5 / serialization / OS jitter — used in NFT-PERF-01 evidence bundle for budget-margin tracking.

Duration: 2 × 5.5 min replays (warm-up + measurement) per configuration; ~25 min total per FC adapter.


NFT-PERF-02: Frame-by-frame streaming (no batching)

Summary: Validates AC-4.4 — estimates streamed frame-by-frame with no batching/delay. Traces to: AC-4.4 Metric: Inter-emit interval at SITL.

Preconditions:

  • Tier-1 OR Tier-2.
  • SUT warmed up for 30 s.

Steps:

Step Consumer Action Measurement
1 Replay Derkachi 5 min at 3 Hz per-frame inter-emit interval at SITL
2 Compute distribution p95 of inter-emit interval

Pass criteria: p95 inter-emit interval ≤ inter-frame-interval × 1.05 (i.e. ≤ ~350 ms at 3 Hz target). No window of ≥3 missed-emit gaps.

Duration: 6 min.


NFT-PERF-03: Cold-start TTFF

Summary: Validates AC-NEW-1 cold-start time-to-first-fix from companion boot. Traces to: AC-NEW-1 Metric: Wall-clock from SUT container-ready event (or systemctl start on Tier-2) to first valid outbound GPS_INPUT / MSP2_SENSOR_GPS arrival at SITL.

Preconditions:

  • Tier-2 (Jetson) for the canonical run; Tier-1 acceptable for trend-tracking.
  • cold-boot-fixture provides the FC EKF snapshot (loaded into SITL before the SUT cold boot).
  • tile-cache-fixture already mounted (cache-load is part of the TTFF budget per AC-NEW-1 wording "from boot").
  • 50 cold boots executed back-to-back to populate distribution.

Steps:

Step Consumer Action Measurement
1 Stop SUT; clear in-memory state container down
2 Start SUT (record t_start) timestamp
3 First outbound message arrives at SITL (record t_first_emit) TTFF = t_first_emit t_start
4 Repeat 50 times distribution

Pass criteria: p95 TTFF < 30 s.

Duration: ~30 min (50 × ~30 s + restart overhead).


NFT-PERF-04: Spoofing-promotion latency

Summary: Validates AC-NEW-2 — when FC signals GPS denial/spoof, promote onboard estimate to FC's primary position source within < 3 s p95. Traces to: AC-NEW-2 Metric: Latency from spoof-onset signal to FC-side EKF source-set switch (AP: EK3_SRC1_POSXY flips to companion-source value; iNav: GPS provider state reflects companion as primary).

Preconditions:

  • Tier-1 acceptable (mostly software loops + SITL).
  • derkachi-fixture running with SUT in satellite_anchored steady state.
  • Spoof injector primed.

Steps:

Step Consumer Action Measurement
1 Inject false GPS into FC SITL (record t_spoof_onset) timestamp
2 Observe FC EKF source-set state via parameter read polling at 100 Hz (record t_promotion) promotion latency = t_promotion t_spoof_onset
3 Repeat 50 trials per FC (parameterized on ardupilot + inav) distribution per FC

Pass criteria: p95 < 3 s on both FCs.

Duration: ~25 min per FC (50 trials × ~30 s including pre-trial reset).


Per-stage latency partition record (informational, not pass/fail)

NFT-PERF-01 captures per-stage latencies matching the D-CROSS-LATENCY-1 partition table from solution.md. The recorded targets are tracked for budget-margin trend (regression detector), not as independent pass/fail thresholds — only AC-4.1 p95 ≤ 400 ms is the hard gate.

Stage K=3 target p95 K=2 hybrid target p95
C1 OKVIS2 VIO ≤ 60 ms ≤ 60 ms
C2 UltraVPR query ≤ 15 ms ≤ 15 ms
C2.5 Top-N re-rank ≤ 80 ms ≤ 80 ms
C3 DISK+LightGlue × N ≤ 200 ms (steady) ≤ 140 ms (thermal)
C3.5 AdHoP (conditional, p99) ≤ 100 ms when triggered ≤ 60 ms when triggered
C4 solvePnPRansac ≤ 25 ms ≤ 25 ms
C4 covariance recovery ≤ 100 ms (steady) ≤ 25 ms (thermal)
C5 iSAM2 update ≤ 15 ms ≤ 15 ms
MAVLink/MSP2 + UART/USB ≤ 30 ms ≤ 30 ms
OS scheduling jitter (p99) ≤ 50 ms ≤ 50 ms