azaion/gps-denied-onboard

Fork 0

mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-06-21 08:21:13 +00:00

Files

T

Oleksandr Bezdieniezhnykh bf13549b32

ci/woodpecker/push/02-build-push Pipeline failed

Details

[autodev] Update configuration and documentation for cycle-1

- Enhanced `.env.example` with detailed CMake build flags and replay-mode strategy flags for development and CI environments.
- Updated `.gitignore` to include a new deploy rollback bookmark.
- Revised `_docs/_autodev_state.md` to reflect the current task status and steps.
- Added new lessons to `_docs/LESSONS.md` regarding testing and architectural improvements.
- Documented changes in `_docs/02_document/deployment/ci_cd_pipeline.md` to reflect the relaxed OpenCV version pin.
- Updated test data documentation in `_docs/02_document/tests/test-data.md` to clarify fixture usage and paths.

This commit continues the cycle-1 documentation sync and addresses various configuration updates for improved clarity and functionality.

2026-05-20 08:05:35 +03:00

8.7 KiB

Raw Permalink Blame History

Performance Test Run — 2026-05-19 — workstation Tier-1 probe

Invoked by: autodev greenfield Step 15 — .cursor/skills/test-run/SKILL.md perf mode. Host: developer Mac workstation (no Jetson hardware, no E2E_SITL_REPLAY_DIR fixture mounted). Runner: scripts/run-performance-tests.sh + direct pytest e2e/tests/performance/ probe. Run ID: workstation-tier1-probe. Status: Unverified across all 4 production perf NFRs; pure-logic evaluator unit tests Pass (70/70). No regression detected because no measurement was possible. No Warn / Fail to gate on. Not blocking deploy per the skill's "Any Unverified scenarios with no Warn/Fail" rule.

What ran

A) `scripts/run-performance-tests.sh`

Tier-2 perf tests skipped (GPS_DENIED_TIER!=2).
exit=0

The runner script is deliberately a Tier-2 gate (pytest -m tier2 -q tests/perf only when GPS_DENIED_TIER=2). On Tier-1 / workstation it exits 0 silently. By design — the canonical perf measurements require Jetson Orin Nano Super hardware (D-C7-9, JetPack 6.2, TensorRT 10.3); a workstation run would produce numbers that DO NOT meet the pinned-hardware budgets and would actively mislead trend tracking.

B) Direct `pytest e2e/tests/performance/` probe (24 parameterizations)

NFR	Configs	Outcome	Skip reason
NFT-PERF-01 (E2E latency p95 ≤ 400 ms — AC-4.1)	6 ({ardupilot, inav} × {okvis2, klt_ransac, vins_mono})	6 skipped	"Tier-2 only — Jetson hardware required"
NFT-PERF-02 (frame-by-frame streaming, inter-emit p95 ≤ 350 ms — AC-4.4)	6 ({ardupilot, inav} × {okvis2, klt_ransac, vins_mono})	4 skipped (no fixture) + 2 skipped (vins_mono research-only per D-C1-1-SUB-A)	"requires `E2E_SITL_REPLAY_DIR` (AZ-595) carrying the 5 min Derkachi @ 3 Hz replay"
NFT-PERF-03 (cold-start TTFF p95 ≤ 30 s — AC-NEW-1)	6	6 skipped	"Tier-2 only — Jetson hardware required"
NFT-PERF-04 (spoof-promotion p95 ≤ 600 ms — AC-NEW-2)	6	4 skipped (no fixture) + 2 skipped (vins_mono research-only per D-C1-1-SUB-A)	"requires `E2E_SITL_REPLAY_DIR` (AZ-595) containing N≥20 randomized-start blackout+spoof events"

Total: 24 skipped, 0 passed, 0 failed, 0 errored. Exit code 0.

C) Pure-logic evaluator unit tests — `e2e/_unit_tests/helpers/test_*_evaluator.py`

The four perf NFRs each map to a pure-logic evaluator that computes the gate (p95 / inter-emit interval / TTFF distribution / spoof-promotion latency) from a recorded sample set. These evaluators are tested without any SITL / Jetson dependency:

e2e/_unit_tests/helpers/test_e2e_latency_evaluator.py      → covers NFT-PERF-01 AC-2/3/4 math
e2e/_unit_tests/helpers/test_streaming_evaluator.py        → covers NFT-PERF-02 AC-1/AC-2 math
e2e/_unit_tests/helpers/test_ttff_evaluator.py             → covers NFT-PERF-03 AC-3/AC-4 math
e2e/_unit_tests/helpers/test_spoof_promotion_evaluator.py  → covers NFT-PERF-04 AC-1/AC-2 math

$ .venv/bin/python -m pytest e2e/_unit_tests/helpers/test_e2e_latency_evaluator.py \
                              e2e/_unit_tests/helpers/test_streaming_evaluator.py \
                              e2e/_unit_tests/helpers/test_ttff_evaluator.py \
                              e2e/_unit_tests/helpers/test_spoof_promotion_evaluator.py \
                              --no-header -q
......................................................................   [100%]
70 passed in 0.50s

70/70 pass. Confirms that the threshold-comparison logic (percentile estimators, inter-emit interval, TTFF distribution, spoof-onset → label-switch delta) is correct independent of whether real measurements have been recorded yet. A future hardware run feeds JSON fixtures into the same evaluators — only the input data changes, not the math.

Threshold comparison (Step 3 of skill)

Per the skill's Step 3, thresholds load from _docs/02_document/tests/performance-tests.md. The thresholds exist and are documented but no scenario produced a measurement to compare them against.

NFR	Threshold	Observed	Verdict
NFT-PERF-01	p95 ≤ 400 ms (K=3 baseline AND K=2 hybrid auto-degrade) + ≤10 % frame drops	—	Unverified (Tier-2 hardware required)
NFT-PERF-02	p95 inter-emit interval ≤ 350 ms; no window of ≥3 missed-emit gaps	—	Unverified (`E2E_SITL_REPLAY_DIR` fixture not yet recorded; AZ-595)
NFT-PERF-03	p95 TTFF < 30 s (50 cold boots)	—	Unverified (Tier-2 hardware required)
NFT-PERF-04	p95 < 3 s on both FCs (50 trials per FC)	—	Unverified (`E2E_SITL_REPLAY_DIR` fixture not yet recorded; AZ-595)

Classification

Per the skill's perf-mode reporting:

══════════════════════════════════════
 PERF RESULTS
══════════════════════════════════════
 Scenarios: [pass 0 · warn 0 · fail 0 · unverified 4]
──────────────────────────────────────
 1. NFT-PERF-01 — Unverified — Tier-2 Jetson hardware required
 2. NFT-PERF-02 — Unverified — SITL replay fixture pending (AZ-595)
 3. NFT-PERF-03 — Unverified — Tier-2 Jetson hardware required
 4. NFT-PERF-04 — Unverified — SITL replay fixture pending (AZ-595)
──────────────────────────────────────
 Pure-logic evaluator coverage: 70/70 unit tests pass
 (e2e/_unit_tests/helpers/test_{e2e_latency,streaming,ttff,spoof_promotion}_evaluator.py)
══════════════════════════════════════

Coverage gap assessment (skill Step 5: "Unverified")

Per the skill:

Any Unverified scenarios with no Warn/Fail → not blocking, but surface them in the report so the user knows coverage gaps exist. Suggest running /test-spec to add expected results next cycle.

This run has 0 Warn + 0 Fail + 4 Unverified, so:

Not deploy-blocking. The perf gate is allowed to be Unverified when the SUT is not yet running on its canonical hardware.
Coverage gap is fully cataloged. Each Unverified scenario points at a concrete task:
- NFT-PERF-01 / NFT-PERF-03: AZ-444 (Tier-2 Jetson harness) is the recording-phase task. When AZ-444 lands, these scenarios run on the Jetson and produce numbers — at which point this report's "Unverified" entries become "Pass / Warn / Fail" against the AC-4.1 / AC-NEW-1 thresholds.
- NFT-PERF-02 / NFT-PERF-04: AZ-595 (SITL replay fixture builder) is the recording task. When AZ-595 lands, the fixtures are committed under e2e/fixtures/sitl_replay/, E2E_SITL_REPLAY_DIR is set, and the scenarios run on Tier-1.
The thresholds, evaluators, parameterizations, and report wiring are all in place. Recording is the only gap, not test design.

Anti-patterns explicitly NOT used

Per the skill's anti-pattern guidance:

No improvised perf tests. Did not synthesize a workstation-only "approximation" of any NFR; the AC-4.1 / AC-NEW-1 / AC-NEW-2 / AC-4.4 budgets are pinned to canonical hardware and synthetic Tier-1 numbers would mislead the trend-tracker.
No skip-acceptance without justification. Each Unverified entry is cataloged against a concrete recording task (AZ-444 / AZ-595).
No threshold downgrade. Did not soften any threshold to make a Tier-1 measurement "pass".

Two minor housekeeping items (Low)

Unregistered pytest mark tier2_only — pytest warnings at e2e/tests/performance/test_nft_perf_01_e2e_latency.py:61 and e2e/tests/performance/test_nft_perf_03_ttff.py:48. Add tier2_only: marks scenarios that require Jetson hardware to e2e/runner/pytest.ini markers list.
scripts/run-performance-tests.sh is intentionally a Tier-2 stub. This is documented in the script header; not a defect, just a reminder that the Tier-1 path is "skip + log" by design. If a Tier-1 perf trend-tracking workflow is ever desired, add an explicit branch (e.g. invoke the pure-logic evaluators against a smaller derkachi-short-fixture).

Cross-Reference Index

Source	Purpose
`_docs/02_document/tests/performance-tests.md`	Threshold + scenario spec
`scripts/run-performance-tests.sh`	Runner script (current Tier-2 stub)
`_docs/02_tasks/todo/AZ-444*`	Tier-2 Jetson harness (recording-phase task)
`_docs/02_tasks/todo/AZ-595*`	SITL replay fixture builder (recording task)
`_docs/02_tasks/todo/AZ-{428..431}*`	NFT-PERF-{01..04} scenario tasks (currently complete on the runner side; the harness side is pending)
`_docs/06_metrics/` (this directory)	Per-run perf trend artefacts

8.7 KiB Raw Permalink Blame History Unescape Escape