mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-06-21 20:31:12 +00:00

Files

T

Oleksandr Bezdieniezhnykh 33e683dc0f [AZ-446] CSV reporter: band + ci95 annotations + report.csv emitter

Batch 89 — adds optional `band`, `ci95_low`, `ci95_high` kw-only
parameters to `_NfrRecorder.record_metric` and emits a new per-metric
report.csv artifact (one row per scenario × metric, columns:
scenario_id, metric_name, value, value_band, ci95_low, ci95_high,
ac_id, outcome). Backwards compatible — existing 4-arg callers
unchanged; unbalanced ci95 pair raises ValueError. report.csv is
written once per pytest session from `pytest_sessionfinish` so the
annotation pass runs once per CI invocation regardless of
(fc_adapter, vio_strategy) (AC-3). `regression-baseline.json`
intentionally kept flat to preserve the diff contract used by
regression-detection tooling.

NFT-RES-03 + NFT-PERF-01 scenarios updated to pass real bands and
compute empirical 2.5/97.5-percentile ci95 from their own sample
streams (per-iteration envelope ratios for Monte Carlo,
per-frame latency samples for N-sample latency).

Tests: 1229 e2e/_unit_tests pass (+6 vs. batch 88 for AZ-446
band/CI behavior, value-error on unbalanced ci95, report.csv columns,
explicit-path override, and end-to-end emission via the pytest
plugin). Code review: PASS_WITH_WARNINGS — 1 Low (empirical-CI
semantics, documented inline), 1 Medium carried over from batch 88's
cumulative-review backlog (write_csv_evidence + _resolve_fixture_path
duplication is outside AZ-446 reporting scope).

This commit closes Step 10 Implement Tests for cycle 1 (41 of 41
blackbox-test tasks done, AZ-406..AZ-446). Greenfield auto-chains to
Step 11 Run Tests next.

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-17 18:14:00 +03:00

5.2 KiB

Raw Blame History

Code Review Report

Batch: 89 — AZ-446 (CSV reporter refinements) Date: 2026-05-17 Verdict: PASS_WITH_WARNINGS

Scope

Files modified:

e2e/runner/reporting/nfr_recorder.py — extended record_metric with band, ci95_low, ci95_high (kw-only); added _RunAggregator.emit_per_metric_report and a pytest_sessionfinish call to it; introduced _stringify helper. No backwards-incompatible changes (existing 4-arg callers keep working).
e2e/tests/resilience/test_nft_res_03_monte_carlo.py — passes band for AC-1 (iteration_count) and AC-3 (envelope_ratio) and computes per-iteration empirical ci95 for envelope_ratio.
e2e/tests/performance/test_nft_perf_01_e2e_latency.py — passes band for AC-4 (frame_drop_ratio) and AC-2/AC-3 (latency_ms_p95); computes empirical 2.5/97.5 percentile of the raw latency samples as ci95 for p50/p95/p99.
e2e/_unit_tests/reporting/test_nfr_recorder.py — 6 new tests covering band+CI persistence, value-error on unbalanced CI, the new report.csv columns, explicit-path overload, and end-to-end emission via the pytest plugin.

Findings

#	Severity	Category	File:Line	Title
1	Low	Style	n/a	"Empirical CI" naming clarification — `ci95_low`/`ci95_high` are emitted as the central 95% range of the underlying sample distribution, not a confidence interval on a point estimator
2	Medium	Maintainability	`e2e/runner/helpers/_evaluator.py`, `e2e/tests/resource_limit/`	Carried over from batches 85–87 and 88: duplicated `write_csv_evidence` + `_resolve_fixture_path` boilerplate. NOT in AZ-446 scope (AZ-446 is reporting-only). Tracked separately.

No Critical / High / Security findings.

Finding Details

F1: Empirical CI naming clarification (Low / Style)

Location: e2e/runner/reporting/nfr_recorder.py (docstring) + _percentile_pair helpers in NFT-RES-03 and NFT-PERF-01.
Description: the task spec says "Monte Carlo confidence intervals where applicable." The implementation emits the empirical 2.5/97.5 percentile of the underlying sample distribution. For nft_res_03.envelope_ratio this is the spread of per-iteration ratios — a defensible MC bootstrap-like signal. For nft_perf_01.latency_ms_p{50,95,99} it is the range of raw per-frame latency samples, NOT a CI on the percentile estimator. A parametric CI on the percentile would require a bootstrap loop (resampling) that materially expands the scope.
Resolution: docstring spells the semantics out; downstream tooling reads the same column whether the source is parametric or empirical. Acceptable trade-off for the 2-point task.
Tasks: AZ-446.

F2: Cumulative-review carry-over from batch 88 (Medium / Maintainability)

Location: e2e/runner/helpers/{memory,fdr_size,storage,thermal_envelope}_evaluator.py and e2e/tests/resource_limit/test_nft_lim_*_*.py.
Description: AZ-446 was speculatively expected to absorb the cumulative-review carry-over for write_csv_evidence and _resolve_fixture_path duplication. Re-examining: AZ-446's spec is reporting-only (band annotations + CI95 columns on the per-metric report.csv + nfr_recorder API). The carried-over duplications live in evaluator helpers and scenario boilerplate, both outside the AZ-446 ownership envelope. A separate maintenance-only PBI is required.
Resolution: not addressed in this batch. The next cumulative review (batches 88–90) will re-flag it; if the user wants it consolidated sooner, a small "PBI" should be opened.
Tasks: not AZ-446.

AC Test Coverage

Task	ACs	Coverage
AZ-446	AC-1 (band)	Covered — `test_record_metric_band_kwarg_stored_in_internal_record`, `test_emit_per_metric_report_writes_csv_with_band_and_ci`, and the scenario callers in `test_nft_res_03_monte_carlo.py` + `test_nft_perf_01_e2e_latency.py`. The `report.csv` header always carries `value_band` (column exists per AC wording).
AZ-446	AC-2 (CI95 for Monte Carlo + N-sample)	Covered — `test_record_metric_ci95_pair_stored_in_internal_record`, `test_record_metric_ci95_unbalanced_rejected_via_fixture_wrapper`, `test_emit_per_metric_report_writes_csv_with_band_and_ci`, end-to-end `test_per_metric_report_emitted_in_pytest_run`. NFT-RES-03 + NFT-PERF-01 callers pass the new args.
AZ-446	AC-3 (parameterization — once per CI invocation)	Covered — the `report.csv` artifact is emitted from `pytest_sessionfinish` (single emission per session, regardless of how many parametrize combinations ran). End-to-end fixture-run unit test verifies emission.

Verdict Logic

Critical: 0
High: 0
Medium: 1 (carry-over from batch 88, not in AZ-446 scope)
Low: 1 (in-scope, documented in docstring)

→ PASS_WITH_WARNINGS

Architecture Compliance (Phase 7)

All edits inside e2e/** (owned by blackbox_tests). No new imports from src/gps_denied_onboard. No new cyclic dependencies; nfr_recorder only adds new public methods + a private helper.
Public-API change is purely additive: _NfrRecorder.record_metric added kw-only band/ci95_low/ci95_high parameters with default None; _RunAggregator.record_metric mirrors them. Existing positional + named callers remain valid.

5.2 KiB Raw Blame History Unescape Escape