[AZ-428] [AZ-429] [AZ-430] [AZ-431] Add NFT-PERF-01..04 perf scenarios

Batch 85 — 4 Performance NFT scenarios + pure-logic evaluators. - NFT-PERF-01 (AZ-428, Tier-2): two-config e2e latency p95 ≤ 400 ms (K=3@25°C, K=2 hybrid@50°C) + frame-drop ≤10% + informational per-stage partition recording (D-CROSS-LATENCY-1). - NFT-PERF-02 (AZ-429): inter-emit p95 ≤ 350 ms + no ≥3 missed-emit windows. fc-adapter-aware SITL timestamp extraction (tlog vs MSP). - NFT-PERF-03 (AZ-430, Tier-2): cold-start TTFF p95 ≤ 30 s AND max ≤ 45 s over N≥10 iterations. - NFT-PERF-04 (AZ-431): spoof-promotion latency p95 ≤ 600 ms over N≥20 randomized-start blackout+spoof events. All scenarios consume external fixtures (AZ-595 dependency surfaced) and fail loudly when fixtures are missing or empty. Public-boundary discipline preserved — evaluators do NOT import src/gps_denied_onboard. Tests: 60 new unit tests pass; 24 scenarios collect (4 tests × 2 fc × 3 vio). Code review: PASS_WITH_WARNINGS — 1 Medium (fixed in batch), 3 Low (production-dependency surfacings + future hygiene). Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 16:21:12 +00:00 · 2026-05-17 16:46:49 +03:00
parent f25cae4a82
commit 73cd632e95
21 changed files with 3063 additions and 6 deletions
@@ -0,0 +1,85 @@
+# NFT-PERF-01 — End-to-end latency p95 budget
+
+**Task**: AZ-428_nft_perf_01_e2e_latency
+**Name**: AC-4.1 latency p95 ≤ 400 ms on Tier-2 (Jetson Orin Nano Super) at 25 °C and 50 °C ambient (AC-4.1, D-CROSS-LATENCY-1)
+**Description**: Implement NFT-PERF-01 — Tier-2-only; 30 s warm-up + 5 min Derkachi replay at 3 Hz; per-frame latency = `t_emit_at_sitl − t_capture`; record CSV; compute p50/p95/p99; per-stage latency partitioning per D-CROSS-LATENCY-1 table; repeat at +50 °C ambient.
+**Complexity**: 5 points
+**Dependencies**: AZ-406, AZ-407, AZ-444 (Tier-2 runner)
+**Component**: Blackbox Tests / Performance (epic AZ-262)
+**Tracker**: AZ-428
+**Epic**: AZ-262 (E-BBT)
+
+## Problem
+
+End-to-end latency is the AC-4.1 hard gate; D-CROSS-LATENCY-1 sets a 400 ms p95 budget across two configurations (K=3 baseline at 25 °C, K=2 + Jacobian-cov hybrid at 50 °C). Without measurement on real Jetson hardware these budgets are unverified.
+
+## Outcome
+
+- pytest scenario at `e2e/tests/performance/test_nft_perf_01_e2e_latency.py`. Tier-2 ONLY (skipped on Tier-1 with the documented reason).
+- Two configurations measured:
+  - (a) K=3 baseline at +25 °C
+  - (b) K=2 + Jacobian-cov hybrid auto-degrade at +50 °C ambient (chamber if available, else flagged)
+- Per-frame latency CSV; p50/p95/p99 computation; assertion `p95 ≤ 400 ms` per config.
+- Per-stage latency partitioning per D-CROSS-LATENCY-1 table (informational only — recorded for trend; not pass/fail).
+
+## Scope
+
+### Included
+- Tier-2 entrypoint via `run-tier2.sh`.
+- 30 s warm-up + 5 min measurement window.
+- Per-frame `t_emit_at_sitl − t_capture` measurement.
+- Per-stage partition (C1 OKVIS2 / C2 UltraVPR / C2.5 / C3 / C3.5 / C4 / C4 cov / C5 / serialization / OS jitter) recorded for trend.
+- Frame-drop accounting (≤10 % under sustained load per AC-4.1).
+
+### Excluded
+- Frame-by-frame inter-emit interval — owned by NFT-PERF-02 (AZ-429).
+- Cold-start TTFF — owned by NFT-PERF-03 (AZ-430).
+- Spoofing-promotion latency — owned by NFT-PERF-04 (AZ-431).
+- The +50 °C chamber portion of AC-NEW-5 — that's a release-gate test handled separately.
+
+## Acceptance Criteria
+
+**AC-1: tier guard**
+Given the test is invoked
+When `tier == tier1-docker`
+Then the test SKIPs with reason `Tier-2 only — Jetson hardware required`.
+
+**AC-2: K=3 baseline @ +25 °C — p95 budget**
+Given the K=3 baseline configuration at +25 °C ambient
+When the 5 min measurement window completes
+Then `p95(t_emit_at_sitl − t_capture) ≤ 400 ms`.
+
+**AC-3: K=2 hybrid @ +50 °C — p95 budget**
+Given the K=2 + Jacobian-cov hybrid auto-degrade configuration at +50 °C
+When the 5 min measurement window completes
+Then `p95 ≤ 400 ms` STILL satisfied (proves D-CROSS-LATENCY-1 effective).
+
+**AC-4: frame-drop allowance**
+Given a sustained 5 min run
+Then the frame-drop count is ≤10 % of expected emissions (3 Hz × 300 s = 900 expected).
+
+**AC-5: per-stage partition recorded**
+Given the partition table (D-CROSS-LATENCY-1)
+Then per-stage p95 values are recorded for all named stages; values are emitted into the evidence bundle as `nft-perf-01-partition.csv`. Assertion is informational (no hard threshold).
+
+**AC-6: parameterization**
+Given conftest parameterization
+Then the scenario runs per `(fc_adapter, vio_strategy)`. Both configurations (a) and (b) run per parameterization.
+
+## System Under Test Boundary
+
+End-to-end on real hardware through public boundaries.
+
+- **Allowed**: SITL-side timestamping at message receipt; `t_capture` from frame source metadata (a public artifact).
+- **Forbidden**: instrumenting the SUT internally to read per-component timings; per-stage partitioning relies on what the SUT exposes via `NAMED_VALUE_FLOAT` or FDR (already-public).
+
+## Constraints
+
+- Tier-2 only — invoking on Tier-1 SKIPs.
+- Chamber availability for +50 °C is a documented optional environment; absent a chamber, the test runs at workstation ambient and emits a `chamber-unavailable` flag in the evidence.
+- Per-stage partition is informational only (no pass/fail threshold).
+
+## Document Dependencies
+
+- `_docs/02_document/tests/performance-tests.md` § NFT-PERF-01
+- `_docs/02_document/tests/test-data.md` § Performance (NFT-PERF-01 row)
@@ -0,0 +1,58 @@
+# NFT-PERF-02 — Frame-by-frame streaming (no batching)
+
+**Task**: AZ-429_nft_perf_02_streaming
+**Name**: Estimates streamed frame-by-frame; no batching/delay (AC-4.4)
+**Description**: Implement NFT-PERF-02 — replay Derkachi 5 min at 3 Hz; observe inter-emit interval at SITL; assert p95 inter-emit ≤ inter-frame × 1.05 (≤350 ms at 3 Hz target); no window of ≥3 missed-emit gaps.
+**Complexity**: 2 points
+**Dependencies**: AZ-406, AZ-407
+**Component**: Blackbox Tests / Performance (epic AZ-262)
+**Tracker**: AZ-429
+**Epic**: AZ-262 (E-BBT)
+
+## Problem
+
+Frame-by-frame streaming (vs hidden batching) is an AC-4.4 contract — the operator UX assumes a steady cadence. Easy to validate via inter-emit interval distribution.
+
+## Outcome
+
+- pytest scenario at `e2e/tests/performance/test_nft_perf_02_streaming.py`. Tier-1 OR Tier-2.
+- 5 min Derkachi replay at 3 Hz target; SITL-side inter-emit interval distribution; assert p95 ≤ 350 ms (inter-frame × 1.05); no window of ≥3 consecutive missed emits.
+
+## Scope
+
+### Included
+- Inter-emit interval measurement at SITL.
+- p95 assertion + missed-emit-window assertion.
+
+### Excluded
+- End-to-end latency (`t_capture → t_emit`) — owned by NFT-PERF-01.
+
+## Acceptance Criteria
+
+**AC-1: p95 inter-emit ≤ 350 ms**
+Given the 5 min replay at 3 Hz target
+Then `p95(inter_emit_interval) ≤ 350 ms`.
+
+**AC-2: no ≥3-emit gap**
+Given the inter-emit stream
+Then no window contains ≥3 consecutive missed emits (a "missed emit" is an interval > 2 × target = 666 ms at 3 Hz).
+
+**AC-3: parameterization**
+Given conftest parameterization
+Then the scenario runs per `(fc_adapter, vio_strategy)` on both tiers.
+
+## System Under Test Boundary
+
+End-to-end through public boundaries.
+
+- **Allowed**: SITL receipt timestamps.
+- **Forbidden**: importing SUT scheduler internals.
+
+## Constraints
+
+- "Inter-emit interval" is the SITL-side timestamp delta between consecutive accepted GPS_INPUT / MSP2_SENSOR_GPS messages.
+
+## Document Dependencies
+
+- `_docs/02_document/tests/performance-tests.md` § NFT-PERF-02
+- `_docs/02_document/tests/test-data.md` § Performance (NFT-PERF-02 row)
@@ -0,0 +1,70 @@
+# NFT-PERF-03 — Cold-start TTFF
+
+**Task**: AZ-430_nft_perf_03_ttff
+**Name**: AC-NEW-1 cold-start TTFF ≤ 30 s on Tier-2 (AC-NEW-1)
+**Description**: Implement NFT-PERF-03 — Tier-2; cold-boot SITL with `cold-boot-fixture`; cold-start SUT (no FDR / cache warm-up); push first nav-camera frame; measure `t_first_emission − t_first_frame_arrival ≤ 30 s`; record p50/p95.
+**Complexity**: 5 points
+**Dependencies**: AZ-406, AZ-407, AZ-444
+**Component**: Blackbox Tests / Performance (epic AZ-262)
+**Tracker**: AZ-430
+**Epic**: AZ-262 (E-BBT)
+
+## Problem
+
+Cold-start TTFF (Time To First Fix) is critical for flight-resume scenarios — AC-NEW-1 sets the 30 s budget. This must be measured on real Jetson hardware over a sample large enough for p95 confidence.
+
+## Outcome
+
+- pytest scenario at `e2e/tests/performance/test_nft_perf_03_ttff.py`. Tier-2 ONLY.
+- Sample of N=10 cold-starts (configurable via env var; default 10 to balance run time vs confidence).
+- Each iteration: clean `fdr-output` volume; load `cold-boot-fixture` into SITL; start SUT; push first frame; measure TTFF.
+- Assert `p95 TTFF ≤ 30 s` AND `max ≤ 45 s` (a relaxed cap to flag tail-latency outliers).
+
+## Scope
+
+### Included
+- N=10 cold-start iterations.
+- Per-iteration TTFF measurement.
+- p50/p95/max computation + assertions.
+
+### Excluded
+- Position accuracy at first emission — owned by FT-P-11 (AZ-419).
+
+## Acceptance Criteria
+
+**AC-1: tier guard**
+Given the test is invoked with `tier == tier1-docker`
+Then the test SKIPs.
+
+**AC-2: clean state per iteration**
+Given each cold-start iteration
+Then `fdr-output` volume is removed + recreated AND any persistent SUT state is cleared.
+
+**AC-3: TTFF p95**
+Given N=10 (or more) iterations
+Then `p95(TTFF) ≤ 30 s`.
+
+**AC-4: TTFF max**
+Given the same iterations
+Then `max(TTFF) ≤ 45 s`.
+
+**AC-5: parameterization**
+Given conftest parameterization
+Then the scenario runs per `(fc_adapter, vio_strategy)`.
+
+## System Under Test Boundary
+
+End-to-end on real hardware through public boundaries.
+
+- **Allowed**: SITL parameter load, SITL state read, SUT lifecycle (`docker compose up` / `systemctl start`).
+- **Forbidden**: pre-warming SUT internal caches.
+
+## Constraints
+
+- Tier-2 only.
+- N=10 is the default; more iterations strengthen the p95 estimate but cost CI time.
+
+## Document Dependencies
+
+- `_docs/02_document/tests/performance-tests.md` § NFT-PERF-03
+- `_docs/02_document/tests/test-data.md` § Performance (NFT-PERF-03 row)
@@ -0,0 +1,61 @@
+# NFT-PERF-04 — Spoofing-promotion latency budget
+
+**Task**: AZ-431_nft_perf_04_spoof_promotion
+**Name**: AC-NEW-2 spoofing-promotion latency p95 ≤ 600 ms (AC-NEW-2)
+**Description**: Implement NFT-PERF-04 — Tier-1 OR Tier-2; replay multiple blackout-onset events with paired GPS spoof injection (sample N=20+ via `blackout_spoof.py` with random window starts); per event measure `t_label_switch_to_dead_reckoned − t_blackout_onset`; assert `p95 ≤ 600 ms`.
+**Complexity**: 3 points
+**Dependencies**: AZ-406, AZ-407, AZ-408
+**Component**: Blackbox Tests / Performance / Security (epic AZ-262)
+**Tracker**: AZ-431
+**Epic**: AZ-262 (E-BBT)
+
+## Problem
+
+Spoofing-promotion latency (the time from a spoof+blackout event to the SUT correctly labeling estimates `dead_reckoned`) is a security-critical metric — slow promotion means the FC may briefly trust spoofed GPS. AC-NEW-2 sets the p95 ≤ 600 ms budget; this requires statistical sampling.
+
+## Outcome
+
+- pytest scenario at `e2e/tests/performance/test_nft_perf_04_spoof_promotion.py`. Tier-1 OR Tier-2.
+- N=20 events (configurable; balances confidence vs CI time): each iteration places `blackout_spoof.py` at a different random window start in the Derkachi flight.
+- Per event: `t_blackout_onset` = injector-emitted timestamp; `t_label_switch_to_dead_reckoned` = first outbound emission with `source_label = dead_reckoned` after onset.
+- Assert `p95(latency) ≤ 600 ms`.
+
+## Scope
+
+### Included
+- N=20 spoof-blackout events at randomized window starts.
+- Per-event latency measurement.
+- p95 assertion.
+
+### Excluded
+- Functional correctness of the failsafe ladder — owned by FT-N-04 (AZ-426).
+
+## Acceptance Criteria
+
+**AC-1: N events sampled**
+Given the test runs
+Then N≥20 spoof-blackout events are exercised across the Derkachi flight.
+
+**AC-2: latency p95**
+Given the per-event latencies
+Then `p95(latency) ≤ 600 ms`.
+
+**AC-3: parameterization**
+Given conftest parameterization
+Then the scenario runs per `(fc_adapter, vio_strategy)`.
+
+## System Under Test Boundary
+
+End-to-end through public boundaries.
+
+- **Allowed**: blackout_spoof injector timestamps, SITL outbound capture.
+- **Forbidden**: instrumenting SUT internals for the switch event; the switch is detected via outbound stream.
+
+## Constraints
+
+- The injector emits a timestamp at the moment of onset (a public artifact); this is the `t_blackout_onset` reference.
+
+## Document Dependencies
+
+- `_docs/02_document/tests/performance-tests.md` § NFT-PERF-04
+- `_docs/02_document/tests/test-data.md` § Performance (NFT-PERF-04 row)