Files
Oleksandr Bezdieniezhnykh 73cd632e95 [AZ-428] [AZ-429] [AZ-430] [AZ-431] Add NFT-PERF-01..04 perf scenarios
Batch 85 — 4 Performance NFT scenarios + pure-logic evaluators.

- NFT-PERF-01 (AZ-428, Tier-2): two-config e2e latency p95 ≤ 400 ms
  (K=3@25°C, K=2 hybrid@50°C) + frame-drop ≤10% + informational per-stage
  partition recording (D-CROSS-LATENCY-1).
- NFT-PERF-02 (AZ-429): inter-emit p95 ≤ 350 ms + no ≥3 missed-emit
  windows. fc-adapter-aware SITL timestamp extraction (tlog vs MSP).
- NFT-PERF-03 (AZ-430, Tier-2): cold-start TTFF p95 ≤ 30 s AND max ≤ 45 s
  over N≥10 iterations.
- NFT-PERF-04 (AZ-431): spoof-promotion latency p95 ≤ 600 ms over N≥20
  randomized-start blackout+spoof events.

All scenarios consume external fixtures (AZ-595 dependency surfaced) and
fail loudly when fixtures are missing or empty. Public-boundary
discipline preserved — evaluators do NOT import src/gps_denied_onboard.

Tests: 60 new unit tests pass; 24 scenarios collect (4 tests × 2 fc × 3
vio). Code review: PASS_WITH_WARNINGS — 1 Medium (fixed in batch),
3 Low (production-dependency surfacings + future hygiene).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 16:46:49 +03:00

8.2 KiB
Raw Permalink Blame History

Batch 85 — AZ-428 + AZ-429 + AZ-430 + AZ-431 (Performance NFTs)

Tracker: AZ-428, AZ-429, AZ-430, AZ-431 Tasks: 4 tasks / 13 complexity points (5 + 2 + 5 + 3)* Date: 2026-05-17 Verdict: PASS_WITH_WARNINGS Review: _docs/03_implementation/reviews/batch_85_review.md

*Note on points: the 4-task batch totals 13 points — driven by AC coverage cohesion (all four are Performance NFTs sharing the _percentile helper). Per the user batch rule of "create PBIs of 2-3 points (≤5)", individual tasks remain within bounds; the batch grouping is intentional for shared-evaluator coherence.

Scope

  • AZ-428 / NFT-PERF-01 (AC-4.1) — Tier-2-only end-to-end latency p95 ≤ 400 ms across two configs (K=3@25 °C + K=2@50 °C hybrid); 5 min Derkachi replay; per-stage partition (D-CROSS-LATENCY-1) recorded for trend (informational).
  • AZ-429 / NFT-PERF-02 (AC-4.4) — Frame-by-frame streaming: p95(inter-emit) ≤ 350 ms; no ≥3 consecutive missed-emit window.
  • AZ-430 / NFT-PERF-03 (AC-NEW-1) — Tier-2-only cold-start TTFF p95 ≤ 30 s AND max ≤ 45 s over N≥10 iterations.
  • AZ-431 / NFT-PERF-04 (AC-NEW-2) — Spoofing-promotion latency p95 ≤ 600 ms over N≥20 randomized-start blackout+spoof events.

Files

Created

  • e2e/runner/helpers/streaming_evaluator.py — inter-emit + missed-emit-window evaluators; shared _percentile helper used by the other 3 evaluators.
  • e2e/runner/helpers/spoof_promotion_evaluator.py — per-event latency from t_blackout_onset → first dead_reckoned label switch + aggregate p50/p95/p99.
  • e2e/runner/helpers/ttff_evaluator.py — per-iteration TTFF samples + AC-3/AC-4 aggregate.
  • e2e/runner/helpers/e2e_latency_evaluator.py — per-frame latency + frame-drop accounting + per-stage partition recording.
  • e2e/tests/performance/test_nft_perf_01_e2e_latency.py — NFT-PERF-01 scenario (Tier-2; two configs).
  • e2e/tests/performance/test_nft_perf_02_streaming.py — NFT-PERF-02 scenario (Tier-1/2; fc-adapter-aware timestamp extraction).
  • e2e/tests/performance/test_nft_perf_03_ttff.py — NFT-PERF-03 scenario (Tier-2-only; fixture-consumer).
  • e2e/tests/performance/test_nft_perf_04_spoof_promotion.py — NFT-PERF-04 scenario (Tier-1/2; fixture-consumer).
  • e2e/_unit_tests/helpers/test_streaming_evaluator.py — 16 unit tests.
  • e2e/_unit_tests/helpers/test_spoof_promotion_evaluator.py — 15 unit tests.
  • e2e/_unit_tests/helpers/test_ttff_evaluator.py — 14 unit tests.
  • e2e/_unit_tests/helpers/test_e2e_latency_evaluator.py — 15 unit tests.

Modified

  • e2e/_unit_tests/test_directory_layout.py — registered 8 new paths.

Test Results

$ pytest e2e/_unit_tests/helpers/test_streaming_evaluator.py \
         e2e/_unit_tests/helpers/test_spoof_promotion_evaluator.py \
         e2e/_unit_tests/helpers/test_ttff_evaluator.py \
         e2e/_unit_tests/helpers/test_e2e_latency_evaluator.py \
         e2e/_unit_tests/test_directory_layout.py
================ 177 passed in 0.34s ================

Scenario collection (24 cases, all parameterised):

$ pytest e2e/tests/performance/ --collect-only -p no:csv
collected 24 items
  test_nft_perf_01_e2e_latency: 6 cases
  test_nft_perf_02_streaming_inter_emit: 6 cases
  test_nft_perf_03_cold_start_ttff: 6 cases
  test_nft_perf_04_spoof_promotion_latency: 6 cases

Full unit suite: 977 passed, 2 failed — both failures are pre-existing (pytest-csv vs csv_reporter plugin conflict on subprocess pytest invocations); confirmed by git stash baseline. Not introduced by batch 85.

AC Verification

AZ-428 / NFT-PERF-01

AC Coverage
AC-1 tier guard @pytest.mark.tier2_only
AC-2 K=3@25 °C p95 ≤ 400 ms per-config assertion in scenario + 4 unit tests
AC-3 K=2 hybrid@50 °C p95 ≤ 400 ms per-config assertion in scenario
AC-4 frame-drop ≤ 10 % LatencyReport.passes_frame_drop + 3 unit tests
AC-5 partition recorded write_partition_csv (informational; no threshold) + 1 unit test
AC-6 parameterization 6 collected variants per config

AZ-429 / NFT-PERF-02

AC Coverage
AC-1 p95 inter-emit ≤ 350 ms evaluate_inter_emit.passes_p95 + 6 unit tests
AC-2 no ≥3 consecutive missed emits evaluate_missed_emits.longest_run + 4 unit tests
AC-3 parameterization 6 collected variants (fc_adapter × vio_strategy)

AZ-430 / NFT-PERF-03

AC Coverage
AC-1 tier guard @pytest.mark.tier2_only
AC-2 clean state per iteration delegated to Tier-2 harness (AZ-444) — surfaced as F3
AC-3 p95(TTFF) ≤ 30 s te.evaluate.passes_p95 + 4 unit tests
AC-4 max(TTFF) ≤ 45 s te.evaluate.passes_max + 2 unit tests
AC-5 parameterization 6 collected variants

AZ-431 / NFT-PERF-04

AC Coverage
AC-1 N≥20 events evaluate.passes_event_count + scenario fixture validation
AC-2 p95 ≤ 600 ms evaluate.passes_p95 + 4 unit tests
AC-3 parameterization 6 collected variants

traces_to markers:

  • NFT-PERF-01: AC-4.1,AC-1,AC-2,AC-3,AC-4,AC-5,AC-6
  • NFT-PERF-02: AC-4.4,AC-1,AC-2,AC-3
  • NFT-PERF-03: AC-NEW-1,AC-1,AC-2,AC-3,AC-4,AC-5
  • NFT-PERF-04: AC-NEW-2,AC-1,AC-2,AC-3

Code Review

Verdict: PASS_WITH_WARNINGS — 0 Critical, 0 High, 1 Medium, 3 Low.

  • F1 (Medium / Maintainability — fixed in batch): NFT-PERF-04's _resolve_events_fixture_path duplicated the sitl_observer import across two branches. Hoisted to function-top during the review pass.
  • F2 (Low / Spec-Gap surfacing): Production dep — blackout_spoof.py injector cannot emit N=20 randomized-start events; scenario consumes external fixture from AZ-595 fixture builder. Surfaced + tracked.
  • F3 (Low / Spec-Gap surfacing): AZ-430 AC-2 (per-iteration clean state) delegated to Tier-2 harness (AZ-444). Scenario only consumes the captured fixture.
  • F4 (Low / Maintainability): CSV-emit boilerplate duplicated across 4 evaluators. Future hygiene PBI.

Full review: _docs/03_implementation/reviews/batch_85_review.md.

Production Dependencies

Surfaced for the cumulative review window (85-87) + traceability matrix:

  1. AZ-444 (Tier-2 runner): per-iteration fdr-output volume wipe + SUT cold lifecycle restart for NFT-PERF-03; tier2-on-jetson.sh orchestration of N=10 iterations.
  2. AZ-595 (fixture builder): emit nft_perf_01_latency.json (N=900 frames × 2 configs + per-stage partition samples), nft_perf_02_streaming capture, nft_perf_03_ttff.json (N≥10 iteration records), nft_perf_04_events.json (N≥20 randomized-start blackout+spoof events with per-event outbound-label samples).
  3. SUT-side: outbound stream MUST carry source_label ∈ {satellite_anchored, visual_propagated, dead_reckoned} for NFT-PERF-04 to detect promotion; FDR (or equivalent) MUST expose per-stage timings (C1, C2, C2.5, C3, C3.5, C4, C4 cov, C5, serialization, OS jitter) for NFT-PERF-01 AC-5 partition recording.
  4. AZ-595 + Derkachi flight: K=2 + Jacobian-cov hybrid auto-degrade configuration must be activatable from fixture-builder side so the K=2@50 °C config captures the right SUT mode.
  5. Already exists: sitl_replay_ready fixture, mavproxy_tlog_reader, msp_frame_observer, sitl_observer.replay_dir(), evidence_dir, nfr_recorder (AZ-406).
  6. Already exists: fc_adapter / vio_strategy parameterization, tier2_only marker, scenario_id marker, traces_to marker.

Architecture Compliance

  • All new files under e2e/, owned by the Blackbox Tests component per _docs/02_document/module-layout.md.
  • No imports from src/gps_denied_onboard (verified — explicit "does NOT import" notes in evaluator docstrings).
  • No new cyclic dependencies. New evaluators share streaming_evaluator._percentile only.
  • No new infrastructure libraries.

Sub-step Trace

Phases executed per implement/SKILL.md:

  • phase 5 (load-spec) → 4 task specs read
  • phase 6 (implement-tasks-sequentially) → helpers + scenarios + unit tests for all 4 tasks
  • phase 7 (verify-ac-coverage) → ACs traced above
  • phase 8 (code-review) → batch_85_review.md (PASS_WITH_WARNINGS)
  • phase 8.5 (cumulative-review) → defer to batch 87 (K=3 window starts at batch 85)
  • phase 11 (commit-batch) → next.