[AZ-428] [AZ-429] [AZ-430] [AZ-431] Add NFT-PERF-01..04 perf scenarios

Batch 85 — 4 Performance NFT scenarios + pure-logic evaluators. - NFT-PERF-01 (AZ-428, Tier-2): two-config e2e latency p95 ≤ 400 ms (K=3@25°C, K=2 hybrid@50°C) + frame-drop ≤10% + informational per-stage partition recording (D-CROSS-LATENCY-1). - NFT-PERF-02 (AZ-429): inter-emit p95 ≤ 350 ms + no ≥3 missed-emit windows. fc-adapter-aware SITL timestamp extraction (tlog vs MSP). - NFT-PERF-03 (AZ-430, Tier-2): cold-start TTFF p95 ≤ 30 s AND max ≤ 45 s over N≥10 iterations. - NFT-PERF-04 (AZ-431): spoof-promotion latency p95 ≤ 600 ms over N≥20 randomized-start blackout+spoof events. All scenarios consume external fixtures (AZ-595 dependency surfaced) and fail loudly when fixtures are missing or empty. Public-boundary discipline preserved — evaluators do NOT import src/gps_denied_onboard. Tests: 60 new unit tests pass; 24 scenarios collect (4 tests × 2 fc × 3 vio). Code review: PASS_WITH_WARNINGS — 1 Medium (fixed in batch), 3 Low (production-dependency surfacings + future hygiene). Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 18:51:15 +00:00 · 2026-05-17 16:46:49 +03:00
parent f25cae4a82
commit 73cd632e95
21 changed files with 3063 additions and 6 deletions
@@ -1,85 +0,0 @@
-# NFT-PERF-01 — End-to-end latency p95 budget
-
-**Task**: AZ-428_nft_perf_01_e2e_latency
-**Name**: AC-4.1 latency p95 ≤ 400 ms on Tier-2 (Jetson Orin Nano Super) at 25 °C and 50 °C ambient (AC-4.1, D-CROSS-LATENCY-1)
-**Description**: Implement NFT-PERF-01 — Tier-2-only; 30 s warm-up + 5 min Derkachi replay at 3 Hz; per-frame latency = `t_emit_at_sitl − t_capture`; record CSV; compute p50/p95/p99; per-stage latency partitioning per D-CROSS-LATENCY-1 table; repeat at +50 °C ambient.
-**Complexity**: 5 points
-**Dependencies**: AZ-406, AZ-407, AZ-444 (Tier-2 runner)
-**Component**: Blackbox Tests / Performance (epic AZ-262)
-**Tracker**: AZ-428
-**Epic**: AZ-262 (E-BBT)
-
-## Problem
-
-End-to-end latency is the AC-4.1 hard gate; D-CROSS-LATENCY-1 sets a 400 ms p95 budget across two configurations (K=3 baseline at 25 °C, K=2 + Jacobian-cov hybrid at 50 °C). Without measurement on real Jetson hardware these budgets are unverified.
-
-## Outcome
-
- pytest scenario at `e2e/tests/performance/test_nft_perf_01_e2e_latency.py`. Tier-2 ONLY (skipped on Tier-1 with the documented reason).
- Two configurations measured:
-  - (a) K=3 baseline at +25 °C
-  - (b) K=2 + Jacobian-cov hybrid auto-degrade at +50 °C ambient (chamber if available, else flagged)
- Per-frame latency CSV; p50/p95/p99 computation; assertion `p95 ≤ 400 ms` per config.
- Per-stage latency partitioning per D-CROSS-LATENCY-1 table (informational only — recorded for trend; not pass/fail).
-
-## Scope
-
-### Included
- Tier-2 entrypoint via `run-tier2.sh`.
- 30 s warm-up + 5 min measurement window.
- Per-frame `t_emit_at_sitl − t_capture` measurement.
- Per-stage partition (C1 OKVIS2 / C2 UltraVPR / C2.5 / C3 / C3.5 / C4 / C4 cov / C5 / serialization / OS jitter) recorded for trend.
- Frame-drop accounting (≤10 % under sustained load per AC-4.1).
-
-### Excluded
- Frame-by-frame inter-emit interval — owned by NFT-PERF-02 (AZ-429).
- Cold-start TTFF — owned by NFT-PERF-03 (AZ-430).
- Spoofing-promotion latency — owned by NFT-PERF-04 (AZ-431).
- The +50 °C chamber portion of AC-NEW-5 — that's a release-gate test handled separately.
-
-## Acceptance Criteria
-
-**AC-1: tier guard**
-Given the test is invoked
-When `tier == tier1-docker`
-Then the test SKIPs with reason `Tier-2 only — Jetson hardware required`.
-
-**AC-2: K=3 baseline @ +25 °C — p95 budget**
-Given the K=3 baseline configuration at +25 °C ambient
-When the 5 min measurement window completes
-Then `p95(t_emit_at_sitl − t_capture) ≤ 400 ms`.
-
-**AC-3: K=2 hybrid @ +50 °C — p95 budget**
-Given the K=2 + Jacobian-cov hybrid auto-degrade configuration at +50 °C
-When the 5 min measurement window completes
-Then `p95 ≤ 400 ms` STILL satisfied (proves D-CROSS-LATENCY-1 effective).
-
-**AC-4: frame-drop allowance**
-Given a sustained 5 min run
-Then the frame-drop count is ≤10 % of expected emissions (3 Hz × 300 s = 900 expected).
-
-**AC-5: per-stage partition recorded**
-Given the partition table (D-CROSS-LATENCY-1)
-Then per-stage p95 values are recorded for all named stages; values are emitted into the evidence bundle as `nft-perf-01-partition.csv`. Assertion is informational (no hard threshold).
-
-**AC-6: parameterization**
-Given conftest parameterization
-Then the scenario runs per `(fc_adapter, vio_strategy)`. Both configurations (a) and (b) run per parameterization.
-
-## System Under Test Boundary
-
-End-to-end on real hardware through public boundaries.
-
- **Allowed**: SITL-side timestamping at message receipt; `t_capture` from frame source metadata (a public artifact).
- **Forbidden**: instrumenting the SUT internally to read per-component timings; per-stage partitioning relies on what the SUT exposes via `NAMED_VALUE_FLOAT` or FDR (already-public).
-
-## Constraints
-
- Tier-2 only — invoking on Tier-1 SKIPs.
- Chamber availability for +50 °C is a documented optional environment; absent a chamber, the test runs at workstation ambient and emits a `chamber-unavailable` flag in the evidence.
- Per-stage partition is informational only (no pass/fail threshold).
-
-## Document Dependencies
-
- `_docs/02_document/tests/performance-tests.md` § NFT-PERF-01
- `_docs/02_document/tests/test-data.md` § Performance (NFT-PERF-01 row)
@@ -1,58 +0,0 @@
-# NFT-PERF-02 — Frame-by-frame streaming (no batching)
-
-**Task**: AZ-429_nft_perf_02_streaming
-**Name**: Estimates streamed frame-by-frame; no batching/delay (AC-4.4)
-**Description**: Implement NFT-PERF-02 — replay Derkachi 5 min at 3 Hz; observe inter-emit interval at SITL; assert p95 inter-emit ≤ inter-frame × 1.05 (≤350 ms at 3 Hz target); no window of ≥3 missed-emit gaps.
-**Complexity**: 2 points
-**Dependencies**: AZ-406, AZ-407
-**Component**: Blackbox Tests / Performance (epic AZ-262)
-**Tracker**: AZ-429
-**Epic**: AZ-262 (E-BBT)
-
-## Problem
-
-Frame-by-frame streaming (vs hidden batching) is an AC-4.4 contract — the operator UX assumes a steady cadence. Easy to validate via inter-emit interval distribution.
-
-## Outcome
-
- pytest scenario at `e2e/tests/performance/test_nft_perf_02_streaming.py`. Tier-1 OR Tier-2.
- 5 min Derkachi replay at 3 Hz target; SITL-side inter-emit interval distribution; assert p95 ≤ 350 ms (inter-frame × 1.05); no window of ≥3 consecutive missed emits.
-
-## Scope
-
-### Included
- Inter-emit interval measurement at SITL.
- p95 assertion + missed-emit-window assertion.
-
-### Excluded
- End-to-end latency (`t_capture → t_emit`) — owned by NFT-PERF-01.
-
-## Acceptance Criteria
-
-**AC-1: p95 inter-emit ≤ 350 ms**
-Given the 5 min replay at 3 Hz target
-Then `p95(inter_emit_interval) ≤ 350 ms`.
-
-**AC-2: no ≥3-emit gap**
-Given the inter-emit stream
-Then no window contains ≥3 consecutive missed emits (a "missed emit" is an interval > 2 × target = 666 ms at 3 Hz).
-
-**AC-3: parameterization**
-Given conftest parameterization
-Then the scenario runs per `(fc_adapter, vio_strategy)` on both tiers.
-
-## System Under Test Boundary
-
-End-to-end through public boundaries.
-
- **Allowed**: SITL receipt timestamps.
- **Forbidden**: importing SUT scheduler internals.
-
-## Constraints
-
- "Inter-emit interval" is the SITL-side timestamp delta between consecutive accepted GPS_INPUT / MSP2_SENSOR_GPS messages.
-
-## Document Dependencies
-
- `_docs/02_document/tests/performance-tests.md` § NFT-PERF-02
- `_docs/02_document/tests/test-data.md` § Performance (NFT-PERF-02 row)
@@ -1,70 +0,0 @@
-# NFT-PERF-03 — Cold-start TTFF
-
-**Task**: AZ-430_nft_perf_03_ttff
-**Name**: AC-NEW-1 cold-start TTFF ≤ 30 s on Tier-2 (AC-NEW-1)
-**Description**: Implement NFT-PERF-03 — Tier-2; cold-boot SITL with `cold-boot-fixture`; cold-start SUT (no FDR / cache warm-up); push first nav-camera frame; measure `t_first_emission − t_first_frame_arrival ≤ 30 s`; record p50/p95.
-**Complexity**: 5 points
-**Dependencies**: AZ-406, AZ-407, AZ-444
-**Component**: Blackbox Tests / Performance (epic AZ-262)
-**Tracker**: AZ-430
-**Epic**: AZ-262 (E-BBT)
-
-## Problem
-
-Cold-start TTFF (Time To First Fix) is critical for flight-resume scenarios — AC-NEW-1 sets the 30 s budget. This must be measured on real Jetson hardware over a sample large enough for p95 confidence.
-
-## Outcome
-
- pytest scenario at `e2e/tests/performance/test_nft_perf_03_ttff.py`. Tier-2 ONLY.
- Sample of N=10 cold-starts (configurable via env var; default 10 to balance run time vs confidence).
- Each iteration: clean `fdr-output` volume; load `cold-boot-fixture` into SITL; start SUT; push first frame; measure TTFF.
- Assert `p95 TTFF ≤ 30 s` AND `max ≤ 45 s` (a relaxed cap to flag tail-latency outliers).
-
-## Scope
-
-### Included
- N=10 cold-start iterations.
- Per-iteration TTFF measurement.
- p50/p95/max computation + assertions.
-
-### Excluded
- Position accuracy at first emission — owned by FT-P-11 (AZ-419).
-
-## Acceptance Criteria
-
-**AC-1: tier guard**
-Given the test is invoked with `tier == tier1-docker`
-Then the test SKIPs.
-
-**AC-2: clean state per iteration**
-Given each cold-start iteration
-Then `fdr-output` volume is removed + recreated AND any persistent SUT state is cleared.
-
-**AC-3: TTFF p95**
-Given N=10 (or more) iterations
-Then `p95(TTFF) ≤ 30 s`.
-
-**AC-4: TTFF max**
-Given the same iterations
-Then `max(TTFF) ≤ 45 s`.
-
-**AC-5: parameterization**
-Given conftest parameterization
-Then the scenario runs per `(fc_adapter, vio_strategy)`.
-
-## System Under Test Boundary
-
-End-to-end on real hardware through public boundaries.
-
- **Allowed**: SITL parameter load, SITL state read, SUT lifecycle (`docker compose up` / `systemctl start`).
- **Forbidden**: pre-warming SUT internal caches.
-
-## Constraints
-
- Tier-2 only.
- N=10 is the default; more iterations strengthen the p95 estimate but cost CI time.
-
-## Document Dependencies
-
- `_docs/02_document/tests/performance-tests.md` § NFT-PERF-03
- `_docs/02_document/tests/test-data.md` § Performance (NFT-PERF-03 row)
@@ -1,61 +0,0 @@
-# NFT-PERF-04 — Spoofing-promotion latency budget
-
-**Task**: AZ-431_nft_perf_04_spoof_promotion
-**Name**: AC-NEW-2 spoofing-promotion latency p95 ≤ 600 ms (AC-NEW-2)
-**Description**: Implement NFT-PERF-04 — Tier-1 OR Tier-2; replay multiple blackout-onset events with paired GPS spoof injection (sample N=20+ via `blackout_spoof.py` with random window starts); per event measure `t_label_switch_to_dead_reckoned − t_blackout_onset`; assert `p95 ≤ 600 ms`.
-**Complexity**: 3 points
-**Dependencies**: AZ-406, AZ-407, AZ-408
-**Component**: Blackbox Tests / Performance / Security (epic AZ-262)
-**Tracker**: AZ-431
-**Epic**: AZ-262 (E-BBT)
-
-## Problem
-
-Spoofing-promotion latency (the time from a spoof+blackout event to the SUT correctly labeling estimates `dead_reckoned`) is a security-critical metric — slow promotion means the FC may briefly trust spoofed GPS. AC-NEW-2 sets the p95 ≤ 600 ms budget; this requires statistical sampling.
-
-## Outcome
-
- pytest scenario at `e2e/tests/performance/test_nft_perf_04_spoof_promotion.py`. Tier-1 OR Tier-2.
- N=20 events (configurable; balances confidence vs CI time): each iteration places `blackout_spoof.py` at a different random window start in the Derkachi flight.
- Per event: `t_blackout_onset` = injector-emitted timestamp; `t_label_switch_to_dead_reckoned` = first outbound emission with `source_label = dead_reckoned` after onset.
- Assert `p95(latency) ≤ 600 ms`.
-
-## Scope
-
-### Included
- N=20 spoof-blackout events at randomized window starts.
- Per-event latency measurement.
- p95 assertion.
-
-### Excluded
- Functional correctness of the failsafe ladder — owned by FT-N-04 (AZ-426).
-
-## Acceptance Criteria
-
-**AC-1: N events sampled**
-Given the test runs
-Then N≥20 spoof-blackout events are exercised across the Derkachi flight.
-
-**AC-2: latency p95**
-Given the per-event latencies
-Then `p95(latency) ≤ 600 ms`.
-
-**AC-3: parameterization**
-Given conftest parameterization
-Then the scenario runs per `(fc_adapter, vio_strategy)`.
-
-## System Under Test Boundary
-
-End-to-end through public boundaries.
-
- **Allowed**: blackout_spoof injector timestamps, SITL outbound capture.
- **Forbidden**: instrumenting SUT internals for the switch event; the switch is detected via outbound stream.
-
-## Constraints
-
- The injector emits a timestamp at the moment of onset (a public artifact); this is the `t_blackout_onset` reference.
-
-## Document Dependencies
-
- `_docs/02_document/tests/performance-tests.md` § NFT-PERF-04
- `_docs/02_document/tests/test-data.md` § Performance (NFT-PERF-04 row)