[AZ-428] [AZ-429] [AZ-430] [AZ-431] Add NFT-PERF-01..04 perf scenarios

Batch 85 — 4 Performance NFT scenarios + pure-logic evaluators.

- NFT-PERF-01 (AZ-428, Tier-2): two-config e2e latency p95 ≤ 400 ms
  (K=3@25°C, K=2 hybrid@50°C) + frame-drop ≤10% + informational per-stage
  partition recording (D-CROSS-LATENCY-1).
- NFT-PERF-02 (AZ-429): inter-emit p95 ≤ 350 ms + no ≥3 missed-emit
  windows. fc-adapter-aware SITL timestamp extraction (tlog vs MSP).
- NFT-PERF-03 (AZ-430, Tier-2): cold-start TTFF p95 ≤ 30 s AND max ≤ 45 s
  over N≥10 iterations.
- NFT-PERF-04 (AZ-431): spoof-promotion latency p95 ≤ 600 ms over N≥20
  randomized-start blackout+spoof events.

All scenarios consume external fixtures (AZ-595 dependency surfaced) and
fail loudly when fixtures are missing or empty. Public-boundary
discipline preserved — evaluators do NOT import src/gps_denied_onboard.

Tests: 60 new unit tests pass; 24 scenarios collect (4 tests × 2 fc × 3
vio). Code review: PASS_WITH_WARNINGS — 1 Medium (fixed in batch),
3 Low (production-dependency surfacings + future hygiene).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-17 16:46:49 +03:00
parent f25cae4a82
commit 73cd632e95
21 changed files with 3063 additions and 6 deletions
@@ -0,0 +1,85 @@
# NFT-PERF-01 — End-to-end latency p95 budget
**Task**: AZ-428_nft_perf_01_e2e_latency
**Name**: AC-4.1 latency p95 ≤ 400 ms on Tier-2 (Jetson Orin Nano Super) at 25 °C and 50 °C ambient (AC-4.1, D-CROSS-LATENCY-1)
**Description**: Implement NFT-PERF-01 — Tier-2-only; 30 s warm-up + 5 min Derkachi replay at 3 Hz; per-frame latency = `t_emit_at_sitl t_capture`; record CSV; compute p50/p95/p99; per-stage latency partitioning per D-CROSS-LATENCY-1 table; repeat at +50 °C ambient.
**Complexity**: 5 points
**Dependencies**: AZ-406, AZ-407, AZ-444 (Tier-2 runner)
**Component**: Blackbox Tests / Performance (epic AZ-262)
**Tracker**: AZ-428
**Epic**: AZ-262 (E-BBT)
## Problem
End-to-end latency is the AC-4.1 hard gate; D-CROSS-LATENCY-1 sets a 400 ms p95 budget across two configurations (K=3 baseline at 25 °C, K=2 + Jacobian-cov hybrid at 50 °C). Without measurement on real Jetson hardware these budgets are unverified.
## Outcome
- pytest scenario at `e2e/tests/performance/test_nft_perf_01_e2e_latency.py`. Tier-2 ONLY (skipped on Tier-1 with the documented reason).
- Two configurations measured:
- (a) K=3 baseline at +25 °C
- (b) K=2 + Jacobian-cov hybrid auto-degrade at +50 °C ambient (chamber if available, else flagged)
- Per-frame latency CSV; p50/p95/p99 computation; assertion `p95 ≤ 400 ms` per config.
- Per-stage latency partitioning per D-CROSS-LATENCY-1 table (informational only — recorded for trend; not pass/fail).
## Scope
### Included
- Tier-2 entrypoint via `run-tier2.sh`.
- 30 s warm-up + 5 min measurement window.
- Per-frame `t_emit_at_sitl t_capture` measurement.
- Per-stage partition (C1 OKVIS2 / C2 UltraVPR / C2.5 / C3 / C3.5 / C4 / C4 cov / C5 / serialization / OS jitter) recorded for trend.
- Frame-drop accounting (≤10 % under sustained load per AC-4.1).
### Excluded
- Frame-by-frame inter-emit interval — owned by NFT-PERF-02 (AZ-429).
- Cold-start TTFF — owned by NFT-PERF-03 (AZ-430).
- Spoofing-promotion latency — owned by NFT-PERF-04 (AZ-431).
- The +50 °C chamber portion of AC-NEW-5 — that's a release-gate test handled separately.
## Acceptance Criteria
**AC-1: tier guard**
Given the test is invoked
When `tier == tier1-docker`
Then the test SKIPs with reason `Tier-2 only — Jetson hardware required`.
**AC-2: K=3 baseline @ +25 °C — p95 budget**
Given the K=3 baseline configuration at +25 °C ambient
When the 5 min measurement window completes
Then `p95(t_emit_at_sitl t_capture) ≤ 400 ms`.
**AC-3: K=2 hybrid @ +50 °C — p95 budget**
Given the K=2 + Jacobian-cov hybrid auto-degrade configuration at +50 °C
When the 5 min measurement window completes
Then `p95 ≤ 400 ms` STILL satisfied (proves D-CROSS-LATENCY-1 effective).
**AC-4: frame-drop allowance**
Given a sustained 5 min run
Then the frame-drop count is ≤10 % of expected emissions (3 Hz × 300 s = 900 expected).
**AC-5: per-stage partition recorded**
Given the partition table (D-CROSS-LATENCY-1)
Then per-stage p95 values are recorded for all named stages; values are emitted into the evidence bundle as `nft-perf-01-partition.csv`. Assertion is informational (no hard threshold).
**AC-6: parameterization**
Given conftest parameterization
Then the scenario runs per `(fc_adapter, vio_strategy)`. Both configurations (a) and (b) run per parameterization.
## System Under Test Boundary
End-to-end on real hardware through public boundaries.
- **Allowed**: SITL-side timestamping at message receipt; `t_capture` from frame source metadata (a public artifact).
- **Forbidden**: instrumenting the SUT internally to read per-component timings; per-stage partitioning relies on what the SUT exposes via `NAMED_VALUE_FLOAT` or FDR (already-public).
## Constraints
- Tier-2 only — invoking on Tier-1 SKIPs.
- Chamber availability for +50 °C is a documented optional environment; absent a chamber, the test runs at workstation ambient and emits a `chamber-unavailable` flag in the evidence.
- Per-stage partition is informational only (no pass/fail threshold).
## Document Dependencies
- `_docs/02_document/tests/performance-tests.md` § NFT-PERF-01
- `_docs/02_document/tests/test-data.md` § Performance (NFT-PERF-01 row)
@@ -0,0 +1,58 @@
# NFT-PERF-02 — Frame-by-frame streaming (no batching)
**Task**: AZ-429_nft_perf_02_streaming
**Name**: Estimates streamed frame-by-frame; no batching/delay (AC-4.4)
**Description**: Implement NFT-PERF-02 — replay Derkachi 5 min at 3 Hz; observe inter-emit interval at SITL; assert p95 inter-emit ≤ inter-frame × 1.05 (≤350 ms at 3 Hz target); no window of ≥3 missed-emit gaps.
**Complexity**: 2 points
**Dependencies**: AZ-406, AZ-407
**Component**: Blackbox Tests / Performance (epic AZ-262)
**Tracker**: AZ-429
**Epic**: AZ-262 (E-BBT)
## Problem
Frame-by-frame streaming (vs hidden batching) is an AC-4.4 contract — the operator UX assumes a steady cadence. Easy to validate via inter-emit interval distribution.
## Outcome
- pytest scenario at `e2e/tests/performance/test_nft_perf_02_streaming.py`. Tier-1 OR Tier-2.
- 5 min Derkachi replay at 3 Hz target; SITL-side inter-emit interval distribution; assert p95 ≤ 350 ms (inter-frame × 1.05); no window of ≥3 consecutive missed emits.
## Scope
### Included
- Inter-emit interval measurement at SITL.
- p95 assertion + missed-emit-window assertion.
### Excluded
- End-to-end latency (`t_capture → t_emit`) — owned by NFT-PERF-01.
## Acceptance Criteria
**AC-1: p95 inter-emit ≤ 350 ms**
Given the 5 min replay at 3 Hz target
Then `p95(inter_emit_interval) ≤ 350 ms`.
**AC-2: no ≥3-emit gap**
Given the inter-emit stream
Then no window contains ≥3 consecutive missed emits (a "missed emit" is an interval > 2 × target = 666 ms at 3 Hz).
**AC-3: parameterization**
Given conftest parameterization
Then the scenario runs per `(fc_adapter, vio_strategy)` on both tiers.
## System Under Test Boundary
End-to-end through public boundaries.
- **Allowed**: SITL receipt timestamps.
- **Forbidden**: importing SUT scheduler internals.
## Constraints
- "Inter-emit interval" is the SITL-side timestamp delta between consecutive accepted GPS_INPUT / MSP2_SENSOR_GPS messages.
## Document Dependencies
- `_docs/02_document/tests/performance-tests.md` § NFT-PERF-02
- `_docs/02_document/tests/test-data.md` § Performance (NFT-PERF-02 row)
@@ -0,0 +1,70 @@
# NFT-PERF-03 — Cold-start TTFF
**Task**: AZ-430_nft_perf_03_ttff
**Name**: AC-NEW-1 cold-start TTFF ≤ 30 s on Tier-2 (AC-NEW-1)
**Description**: Implement NFT-PERF-03 — Tier-2; cold-boot SITL with `cold-boot-fixture`; cold-start SUT (no FDR / cache warm-up); push first nav-camera frame; measure `t_first_emission t_first_frame_arrival ≤ 30 s`; record p50/p95.
**Complexity**: 5 points
**Dependencies**: AZ-406, AZ-407, AZ-444
**Component**: Blackbox Tests / Performance (epic AZ-262)
**Tracker**: AZ-430
**Epic**: AZ-262 (E-BBT)
## Problem
Cold-start TTFF (Time To First Fix) is critical for flight-resume scenarios — AC-NEW-1 sets the 30 s budget. This must be measured on real Jetson hardware over a sample large enough for p95 confidence.
## Outcome
- pytest scenario at `e2e/tests/performance/test_nft_perf_03_ttff.py`. Tier-2 ONLY.
- Sample of N=10 cold-starts (configurable via env var; default 10 to balance run time vs confidence).
- Each iteration: clean `fdr-output` volume; load `cold-boot-fixture` into SITL; start SUT; push first frame; measure TTFF.
- Assert `p95 TTFF ≤ 30 s` AND `max ≤ 45 s` (a relaxed cap to flag tail-latency outliers).
## Scope
### Included
- N=10 cold-start iterations.
- Per-iteration TTFF measurement.
- p50/p95/max computation + assertions.
### Excluded
- Position accuracy at first emission — owned by FT-P-11 (AZ-419).
## Acceptance Criteria
**AC-1: tier guard**
Given the test is invoked with `tier == tier1-docker`
Then the test SKIPs.
**AC-2: clean state per iteration**
Given each cold-start iteration
Then `fdr-output` volume is removed + recreated AND any persistent SUT state is cleared.
**AC-3: TTFF p95**
Given N=10 (or more) iterations
Then `p95(TTFF) ≤ 30 s`.
**AC-4: TTFF max**
Given the same iterations
Then `max(TTFF) ≤ 45 s`.
**AC-5: parameterization**
Given conftest parameterization
Then the scenario runs per `(fc_adapter, vio_strategy)`.
## System Under Test Boundary
End-to-end on real hardware through public boundaries.
- **Allowed**: SITL parameter load, SITL state read, SUT lifecycle (`docker compose up` / `systemctl start`).
- **Forbidden**: pre-warming SUT internal caches.
## Constraints
- Tier-2 only.
- N=10 is the default; more iterations strengthen the p95 estimate but cost CI time.
## Document Dependencies
- `_docs/02_document/tests/performance-tests.md` § NFT-PERF-03
- `_docs/02_document/tests/test-data.md` § Performance (NFT-PERF-03 row)
@@ -0,0 +1,61 @@
# NFT-PERF-04 — Spoofing-promotion latency budget
**Task**: AZ-431_nft_perf_04_spoof_promotion
**Name**: AC-NEW-2 spoofing-promotion latency p95 ≤ 600 ms (AC-NEW-2)
**Description**: Implement NFT-PERF-04 — Tier-1 OR Tier-2; replay multiple blackout-onset events with paired GPS spoof injection (sample N=20+ via `blackout_spoof.py` with random window starts); per event measure `t_label_switch_to_dead_reckoned t_blackout_onset`; assert `p95 ≤ 600 ms`.
**Complexity**: 3 points
**Dependencies**: AZ-406, AZ-407, AZ-408
**Component**: Blackbox Tests / Performance / Security (epic AZ-262)
**Tracker**: AZ-431
**Epic**: AZ-262 (E-BBT)
## Problem
Spoofing-promotion latency (the time from a spoof+blackout event to the SUT correctly labeling estimates `dead_reckoned`) is a security-critical metric — slow promotion means the FC may briefly trust spoofed GPS. AC-NEW-2 sets the p95 ≤ 600 ms budget; this requires statistical sampling.
## Outcome
- pytest scenario at `e2e/tests/performance/test_nft_perf_04_spoof_promotion.py`. Tier-1 OR Tier-2.
- N=20 events (configurable; balances confidence vs CI time): each iteration places `blackout_spoof.py` at a different random window start in the Derkachi flight.
- Per event: `t_blackout_onset` = injector-emitted timestamp; `t_label_switch_to_dead_reckoned` = first outbound emission with `source_label = dead_reckoned` after onset.
- Assert `p95(latency) ≤ 600 ms`.
## Scope
### Included
- N=20 spoof-blackout events at randomized window starts.
- Per-event latency measurement.
- p95 assertion.
### Excluded
- Functional correctness of the failsafe ladder — owned by FT-N-04 (AZ-426).
## Acceptance Criteria
**AC-1: N events sampled**
Given the test runs
Then N≥20 spoof-blackout events are exercised across the Derkachi flight.
**AC-2: latency p95**
Given the per-event latencies
Then `p95(latency) ≤ 600 ms`.
**AC-3: parameterization**
Given conftest parameterization
Then the scenario runs per `(fc_adapter, vio_strategy)`.
## System Under Test Boundary
End-to-end through public boundaries.
- **Allowed**: blackout_spoof injector timestamps, SITL outbound capture.
- **Forbidden**: instrumenting SUT internals for the switch event; the switch is detected via outbound stream.
## Constraints
- The injector emits a timestamp at the moment of onset (a public artifact); this is the `t_blackout_onset` reference.
## Document Dependencies
- `_docs/02_document/tests/performance-tests.md` § NFT-PERF-04
- `_docs/02_document/tests/test-data.md` § Performance (NFT-PERF-04 row)