mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 18:51:15 +00:00
[AZ-428] [AZ-429] [AZ-430] [AZ-431] Add NFT-PERF-01..04 perf scenarios
Batch 85 — 4 Performance NFT scenarios + pure-logic evaluators. - NFT-PERF-01 (AZ-428, Tier-2): two-config e2e latency p95 ≤ 400 ms (K=3@25°C, K=2 hybrid@50°C) + frame-drop ≤10% + informational per-stage partition recording (D-CROSS-LATENCY-1). - NFT-PERF-02 (AZ-429): inter-emit p95 ≤ 350 ms + no ≥3 missed-emit windows. fc-adapter-aware SITL timestamp extraction (tlog vs MSP). - NFT-PERF-03 (AZ-430, Tier-2): cold-start TTFF p95 ≤ 30 s AND max ≤ 45 s over N≥10 iterations. - NFT-PERF-04 (AZ-431): spoof-promotion latency p95 ≤ 600 ms over N≥20 randomized-start blackout+spoof events. All scenarios consume external fixtures (AZ-595 dependency surfaced) and fail loudly when fixtures are missing or empty. Public-boundary discipline preserved — evaluators do NOT import src/gps_denied_onboard. Tests: 60 new unit tests pass; 24 scenarios collect (4 tests × 2 fc × 3 vio). Code review: PASS_WITH_WARNINGS — 1 Medium (fixed in batch), 3 Low (production-dependency surfacings + future hygiene). Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -1,85 +0,0 @@
|
||||
# NFT-PERF-01 — End-to-end latency p95 budget
|
||||
|
||||
**Task**: AZ-428_nft_perf_01_e2e_latency
|
||||
**Name**: AC-4.1 latency p95 ≤ 400 ms on Tier-2 (Jetson Orin Nano Super) at 25 °C and 50 °C ambient (AC-4.1, D-CROSS-LATENCY-1)
|
||||
**Description**: Implement NFT-PERF-01 — Tier-2-only; 30 s warm-up + 5 min Derkachi replay at 3 Hz; per-frame latency = `t_emit_at_sitl − t_capture`; record CSV; compute p50/p95/p99; per-stage latency partitioning per D-CROSS-LATENCY-1 table; repeat at +50 °C ambient.
|
||||
**Complexity**: 5 points
|
||||
**Dependencies**: AZ-406, AZ-407, AZ-444 (Tier-2 runner)
|
||||
**Component**: Blackbox Tests / Performance (epic AZ-262)
|
||||
**Tracker**: AZ-428
|
||||
**Epic**: AZ-262 (E-BBT)
|
||||
|
||||
## Problem
|
||||
|
||||
End-to-end latency is the AC-4.1 hard gate; D-CROSS-LATENCY-1 sets a 400 ms p95 budget across two configurations (K=3 baseline at 25 °C, K=2 + Jacobian-cov hybrid at 50 °C). Without measurement on real Jetson hardware these budgets are unverified.
|
||||
|
||||
## Outcome
|
||||
|
||||
- pytest scenario at `e2e/tests/performance/test_nft_perf_01_e2e_latency.py`. Tier-2 ONLY (skipped on Tier-1 with the documented reason).
|
||||
- Two configurations measured:
|
||||
- (a) K=3 baseline at +25 °C
|
||||
- (b) K=2 + Jacobian-cov hybrid auto-degrade at +50 °C ambient (chamber if available, else flagged)
|
||||
- Per-frame latency CSV; p50/p95/p99 computation; assertion `p95 ≤ 400 ms` per config.
|
||||
- Per-stage latency partitioning per D-CROSS-LATENCY-1 table (informational only — recorded for trend; not pass/fail).
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- Tier-2 entrypoint via `run-tier2.sh`.
|
||||
- 30 s warm-up + 5 min measurement window.
|
||||
- Per-frame `t_emit_at_sitl − t_capture` measurement.
|
||||
- Per-stage partition (C1 OKVIS2 / C2 UltraVPR / C2.5 / C3 / C3.5 / C4 / C4 cov / C5 / serialization / OS jitter) recorded for trend.
|
||||
- Frame-drop accounting (≤10 % under sustained load per AC-4.1).
|
||||
|
||||
### Excluded
|
||||
- Frame-by-frame inter-emit interval — owned by NFT-PERF-02 (AZ-429).
|
||||
- Cold-start TTFF — owned by NFT-PERF-03 (AZ-430).
|
||||
- Spoofing-promotion latency — owned by NFT-PERF-04 (AZ-431).
|
||||
- The +50 °C chamber portion of AC-NEW-5 — that's a release-gate test handled separately.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: tier guard**
|
||||
Given the test is invoked
|
||||
When `tier == tier1-docker`
|
||||
Then the test SKIPs with reason `Tier-2 only — Jetson hardware required`.
|
||||
|
||||
**AC-2: K=3 baseline @ +25 °C — p95 budget**
|
||||
Given the K=3 baseline configuration at +25 °C ambient
|
||||
When the 5 min measurement window completes
|
||||
Then `p95(t_emit_at_sitl − t_capture) ≤ 400 ms`.
|
||||
|
||||
**AC-3: K=2 hybrid @ +50 °C — p95 budget**
|
||||
Given the K=2 + Jacobian-cov hybrid auto-degrade configuration at +50 °C
|
||||
When the 5 min measurement window completes
|
||||
Then `p95 ≤ 400 ms` STILL satisfied (proves D-CROSS-LATENCY-1 effective).
|
||||
|
||||
**AC-4: frame-drop allowance**
|
||||
Given a sustained 5 min run
|
||||
Then the frame-drop count is ≤10 % of expected emissions (3 Hz × 300 s = 900 expected).
|
||||
|
||||
**AC-5: per-stage partition recorded**
|
||||
Given the partition table (D-CROSS-LATENCY-1)
|
||||
Then per-stage p95 values are recorded for all named stages; values are emitted into the evidence bundle as `nft-perf-01-partition.csv`. Assertion is informational (no hard threshold).
|
||||
|
||||
**AC-6: parameterization**
|
||||
Given conftest parameterization
|
||||
Then the scenario runs per `(fc_adapter, vio_strategy)`. Both configurations (a) and (b) run per parameterization.
|
||||
|
||||
## System Under Test Boundary
|
||||
|
||||
End-to-end on real hardware through public boundaries.
|
||||
|
||||
- **Allowed**: SITL-side timestamping at message receipt; `t_capture` from frame source metadata (a public artifact).
|
||||
- **Forbidden**: instrumenting the SUT internally to read per-component timings; per-stage partitioning relies on what the SUT exposes via `NAMED_VALUE_FLOAT` or FDR (already-public).
|
||||
|
||||
## Constraints
|
||||
|
||||
- Tier-2 only — invoking on Tier-1 SKIPs.
|
||||
- Chamber availability for +50 °C is a documented optional environment; absent a chamber, the test runs at workstation ambient and emits a `chamber-unavailable` flag in the evidence.
|
||||
- Per-stage partition is informational only (no pass/fail threshold).
|
||||
|
||||
## Document Dependencies
|
||||
|
||||
- `_docs/02_document/tests/performance-tests.md` § NFT-PERF-01
|
||||
- `_docs/02_document/tests/test-data.md` § Performance (NFT-PERF-01 row)
|
||||
@@ -1,58 +0,0 @@
|
||||
# NFT-PERF-02 — Frame-by-frame streaming (no batching)
|
||||
|
||||
**Task**: AZ-429_nft_perf_02_streaming
|
||||
**Name**: Estimates streamed frame-by-frame; no batching/delay (AC-4.4)
|
||||
**Description**: Implement NFT-PERF-02 — replay Derkachi 5 min at 3 Hz; observe inter-emit interval at SITL; assert p95 inter-emit ≤ inter-frame × 1.05 (≤350 ms at 3 Hz target); no window of ≥3 missed-emit gaps.
|
||||
**Complexity**: 2 points
|
||||
**Dependencies**: AZ-406, AZ-407
|
||||
**Component**: Blackbox Tests / Performance (epic AZ-262)
|
||||
**Tracker**: AZ-429
|
||||
**Epic**: AZ-262 (E-BBT)
|
||||
|
||||
## Problem
|
||||
|
||||
Frame-by-frame streaming (vs hidden batching) is an AC-4.4 contract — the operator UX assumes a steady cadence. Easy to validate via inter-emit interval distribution.
|
||||
|
||||
## Outcome
|
||||
|
||||
- pytest scenario at `e2e/tests/performance/test_nft_perf_02_streaming.py`. Tier-1 OR Tier-2.
|
||||
- 5 min Derkachi replay at 3 Hz target; SITL-side inter-emit interval distribution; assert p95 ≤ 350 ms (inter-frame × 1.05); no window of ≥3 consecutive missed emits.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- Inter-emit interval measurement at SITL.
|
||||
- p95 assertion + missed-emit-window assertion.
|
||||
|
||||
### Excluded
|
||||
- End-to-end latency (`t_capture → t_emit`) — owned by NFT-PERF-01.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: p95 inter-emit ≤ 350 ms**
|
||||
Given the 5 min replay at 3 Hz target
|
||||
Then `p95(inter_emit_interval) ≤ 350 ms`.
|
||||
|
||||
**AC-2: no ≥3-emit gap**
|
||||
Given the inter-emit stream
|
||||
Then no window contains ≥3 consecutive missed emits (a "missed emit" is an interval > 2 × target = 666 ms at 3 Hz).
|
||||
|
||||
**AC-3: parameterization**
|
||||
Given conftest parameterization
|
||||
Then the scenario runs per `(fc_adapter, vio_strategy)` on both tiers.
|
||||
|
||||
## System Under Test Boundary
|
||||
|
||||
End-to-end through public boundaries.
|
||||
|
||||
- **Allowed**: SITL receipt timestamps.
|
||||
- **Forbidden**: importing SUT scheduler internals.
|
||||
|
||||
## Constraints
|
||||
|
||||
- "Inter-emit interval" is the SITL-side timestamp delta between consecutive accepted GPS_INPUT / MSP2_SENSOR_GPS messages.
|
||||
|
||||
## Document Dependencies
|
||||
|
||||
- `_docs/02_document/tests/performance-tests.md` § NFT-PERF-02
|
||||
- `_docs/02_document/tests/test-data.md` § Performance (NFT-PERF-02 row)
|
||||
@@ -1,70 +0,0 @@
|
||||
# NFT-PERF-03 — Cold-start TTFF
|
||||
|
||||
**Task**: AZ-430_nft_perf_03_ttff
|
||||
**Name**: AC-NEW-1 cold-start TTFF ≤ 30 s on Tier-2 (AC-NEW-1)
|
||||
**Description**: Implement NFT-PERF-03 — Tier-2; cold-boot SITL with `cold-boot-fixture`; cold-start SUT (no FDR / cache warm-up); push first nav-camera frame; measure `t_first_emission − t_first_frame_arrival ≤ 30 s`; record p50/p95.
|
||||
**Complexity**: 5 points
|
||||
**Dependencies**: AZ-406, AZ-407, AZ-444
|
||||
**Component**: Blackbox Tests / Performance (epic AZ-262)
|
||||
**Tracker**: AZ-430
|
||||
**Epic**: AZ-262 (E-BBT)
|
||||
|
||||
## Problem
|
||||
|
||||
Cold-start TTFF (Time To First Fix) is critical for flight-resume scenarios — AC-NEW-1 sets the 30 s budget. This must be measured on real Jetson hardware over a sample large enough for p95 confidence.
|
||||
|
||||
## Outcome
|
||||
|
||||
- pytest scenario at `e2e/tests/performance/test_nft_perf_03_ttff.py`. Tier-2 ONLY.
|
||||
- Sample of N=10 cold-starts (configurable via env var; default 10 to balance run time vs confidence).
|
||||
- Each iteration: clean `fdr-output` volume; load `cold-boot-fixture` into SITL; start SUT; push first frame; measure TTFF.
|
||||
- Assert `p95 TTFF ≤ 30 s` AND `max ≤ 45 s` (a relaxed cap to flag tail-latency outliers).
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- N=10 cold-start iterations.
|
||||
- Per-iteration TTFF measurement.
|
||||
- p50/p95/max computation + assertions.
|
||||
|
||||
### Excluded
|
||||
- Position accuracy at first emission — owned by FT-P-11 (AZ-419).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: tier guard**
|
||||
Given the test is invoked with `tier == tier1-docker`
|
||||
Then the test SKIPs.
|
||||
|
||||
**AC-2: clean state per iteration**
|
||||
Given each cold-start iteration
|
||||
Then `fdr-output` volume is removed + recreated AND any persistent SUT state is cleared.
|
||||
|
||||
**AC-3: TTFF p95**
|
||||
Given N=10 (or more) iterations
|
||||
Then `p95(TTFF) ≤ 30 s`.
|
||||
|
||||
**AC-4: TTFF max**
|
||||
Given the same iterations
|
||||
Then `max(TTFF) ≤ 45 s`.
|
||||
|
||||
**AC-5: parameterization**
|
||||
Given conftest parameterization
|
||||
Then the scenario runs per `(fc_adapter, vio_strategy)`.
|
||||
|
||||
## System Under Test Boundary
|
||||
|
||||
End-to-end on real hardware through public boundaries.
|
||||
|
||||
- **Allowed**: SITL parameter load, SITL state read, SUT lifecycle (`docker compose up` / `systemctl start`).
|
||||
- **Forbidden**: pre-warming SUT internal caches.
|
||||
|
||||
## Constraints
|
||||
|
||||
- Tier-2 only.
|
||||
- N=10 is the default; more iterations strengthen the p95 estimate but cost CI time.
|
||||
|
||||
## Document Dependencies
|
||||
|
||||
- `_docs/02_document/tests/performance-tests.md` § NFT-PERF-03
|
||||
- `_docs/02_document/tests/test-data.md` § Performance (NFT-PERF-03 row)
|
||||
@@ -1,61 +0,0 @@
|
||||
# NFT-PERF-04 — Spoofing-promotion latency budget
|
||||
|
||||
**Task**: AZ-431_nft_perf_04_spoof_promotion
|
||||
**Name**: AC-NEW-2 spoofing-promotion latency p95 ≤ 600 ms (AC-NEW-2)
|
||||
**Description**: Implement NFT-PERF-04 — Tier-1 OR Tier-2; replay multiple blackout-onset events with paired GPS spoof injection (sample N=20+ via `blackout_spoof.py` with random window starts); per event measure `t_label_switch_to_dead_reckoned − t_blackout_onset`; assert `p95 ≤ 600 ms`.
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-406, AZ-407, AZ-408
|
||||
**Component**: Blackbox Tests / Performance / Security (epic AZ-262)
|
||||
**Tracker**: AZ-431
|
||||
**Epic**: AZ-262 (E-BBT)
|
||||
|
||||
## Problem
|
||||
|
||||
Spoofing-promotion latency (the time from a spoof+blackout event to the SUT correctly labeling estimates `dead_reckoned`) is a security-critical metric — slow promotion means the FC may briefly trust spoofed GPS. AC-NEW-2 sets the p95 ≤ 600 ms budget; this requires statistical sampling.
|
||||
|
||||
## Outcome
|
||||
|
||||
- pytest scenario at `e2e/tests/performance/test_nft_perf_04_spoof_promotion.py`. Tier-1 OR Tier-2.
|
||||
- N=20 events (configurable; balances confidence vs CI time): each iteration places `blackout_spoof.py` at a different random window start in the Derkachi flight.
|
||||
- Per event: `t_blackout_onset` = injector-emitted timestamp; `t_label_switch_to_dead_reckoned` = first outbound emission with `source_label = dead_reckoned` after onset.
|
||||
- Assert `p95(latency) ≤ 600 ms`.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- N=20 spoof-blackout events at randomized window starts.
|
||||
- Per-event latency measurement.
|
||||
- p95 assertion.
|
||||
|
||||
### Excluded
|
||||
- Functional correctness of the failsafe ladder — owned by FT-N-04 (AZ-426).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: N events sampled**
|
||||
Given the test runs
|
||||
Then N≥20 spoof-blackout events are exercised across the Derkachi flight.
|
||||
|
||||
**AC-2: latency p95**
|
||||
Given the per-event latencies
|
||||
Then `p95(latency) ≤ 600 ms`.
|
||||
|
||||
**AC-3: parameterization**
|
||||
Given conftest parameterization
|
||||
Then the scenario runs per `(fc_adapter, vio_strategy)`.
|
||||
|
||||
## System Under Test Boundary
|
||||
|
||||
End-to-end through public boundaries.
|
||||
|
||||
- **Allowed**: blackout_spoof injector timestamps, SITL outbound capture.
|
||||
- **Forbidden**: instrumenting SUT internals for the switch event; the switch is detected via outbound stream.
|
||||
|
||||
## Constraints
|
||||
|
||||
- The injector emits a timestamp at the moment of onset (a public artifact); this is the `t_blackout_onset` reference.
|
||||
|
||||
## Document Dependencies
|
||||
|
||||
- `_docs/02_document/tests/performance-tests.md` § NFT-PERF-04
|
||||
- `_docs/02_document/tests/test-data.md` § Performance (NFT-PERF-04 row)
|
||||
Reference in New Issue
Block a user