# Code Review Report **Batch**: 89 — AZ-446 (CSV reporter refinements) **Date**: 2026-05-17 **Verdict**: PASS_WITH_WARNINGS ## Scope Files modified: - `e2e/runner/reporting/nfr_recorder.py` — extended `record_metric` with `band`, `ci95_low`, `ci95_high` (kw-only); added `_RunAggregator.emit_per_metric_report` and a `pytest_sessionfinish` call to it; introduced `_stringify` helper. No backwards-incompatible changes (existing 4-arg callers keep working). - `e2e/tests/resilience/test_nft_res_03_monte_carlo.py` — passes band for AC-1 (`iteration_count`) and AC-3 (`envelope_ratio`) and computes per-iteration empirical ci95 for `envelope_ratio`. - `e2e/tests/performance/test_nft_perf_01_e2e_latency.py` — passes band for AC-4 (`frame_drop_ratio`) and AC-2/AC-3 (`latency_ms_p95`); computes empirical 2.5/97.5 percentile of the raw latency samples as ci95 for p50/p95/p99. - `e2e/_unit_tests/reporting/test_nfr_recorder.py` — 6 new tests covering band+CI persistence, value-error on unbalanced CI, the new `report.csv` columns, explicit-path overload, and end-to-end emission via the pytest plugin. ## Findings | # | Severity | Category | File:Line | Title | |---|----------|----------|-----------|-------| | 1 | Low | Style | n/a | "Empirical CI" naming clarification — `ci95_low`/`ci95_high` are emitted as the central 95% range of the underlying sample distribution, not a confidence interval on a point estimator | | 2 | Medium | Maintainability | `e2e/runner/helpers/*_evaluator.py`, `e2e/tests/resource_limit/*` | Carried over from batches 85–87 and 88: duplicated `write_csv_evidence` + `_resolve_fixture_path` boilerplate. NOT in AZ-446 scope (AZ-446 is reporting-only). Tracked separately. | No Critical / High / Security findings. ## Finding Details ### F1: Empirical CI naming clarification (Low / Style) - Location: `e2e/runner/reporting/nfr_recorder.py` (docstring) + `_percentile_pair` helpers in NFT-RES-03 and NFT-PERF-01. - Description: the task spec says "Monte Carlo confidence intervals where applicable." The implementation emits the empirical 2.5/97.5 percentile of the underlying sample distribution. For `nft_res_03.envelope_ratio` this is the spread of per-iteration ratios — a defensible MC bootstrap-like signal. For `nft_perf_01.latency_ms_p{50,95,99}` it is the range of raw per-frame latency samples, NOT a CI on the percentile estimator. A parametric CI on the percentile would require a bootstrap loop (resampling) that materially expands the scope. - Resolution: docstring spells the semantics out; downstream tooling reads the same column whether the source is parametric or empirical. Acceptable trade-off for the 2-point task. - Tasks: AZ-446. ### F2: Cumulative-review carry-over from batch 88 (Medium / Maintainability) - Location: `e2e/runner/helpers/{memory,fdr_size,storage,thermal_envelope}_evaluator.py` and `e2e/tests/resource_limit/test_nft_lim_*_*.py`. - Description: AZ-446 was speculatively expected to absorb the cumulative-review carry-over for `write_csv_evidence` and `_resolve_fixture_path` duplication. Re-examining: AZ-446's spec is reporting-only (band annotations + CI95 columns on the per-metric `report.csv` + nfr_recorder API). The carried-over duplications live in evaluator helpers and scenario boilerplate, both outside the AZ-446 ownership envelope. A separate maintenance-only PBI is required. - Resolution: not addressed in this batch. The next cumulative review (batches 88–90) will re-flag it; if the user wants it consolidated sooner, a small "PBI" should be opened. - Tasks: not AZ-446. ## AC Test Coverage | Task | ACs | Coverage | |------|-----|----------| | AZ-446 | AC-1 (band) | Covered — `test_record_metric_band_kwarg_stored_in_internal_record`, `test_emit_per_metric_report_writes_csv_with_band_and_ci`, and the scenario callers in `test_nft_res_03_monte_carlo.py` + `test_nft_perf_01_e2e_latency.py`. The `report.csv` header always carries `value_band` (column exists per AC wording). | | AZ-446 | AC-2 (CI95 for Monte Carlo + N-sample) | Covered — `test_record_metric_ci95_pair_stored_in_internal_record`, `test_record_metric_ci95_unbalanced_rejected_via_fixture_wrapper`, `test_emit_per_metric_report_writes_csv_with_band_and_ci`, end-to-end `test_per_metric_report_emitted_in_pytest_run`. NFT-RES-03 + NFT-PERF-01 callers pass the new args. | | AZ-446 | AC-3 (parameterization — once per CI invocation) | Covered — the `report.csv` artifact is emitted from `pytest_sessionfinish` (single emission per session, regardless of how many parametrize combinations ran). End-to-end fixture-run unit test verifies emission. | ## Verdict Logic - Critical: 0 - High: 0 - Medium: 1 (carry-over from batch 88, not in AZ-446 scope) - Low: 1 (in-scope, documented in docstring) → **PASS_WITH_WARNINGS** ## Architecture Compliance (Phase 7) - All edits inside `e2e/**` (owned by `blackbox_tests`). No new imports from `src/gps_denied_onboard`. No new cyclic dependencies; `nfr_recorder` only adds new public methods + a private helper. - Public-API change is purely additive: `_NfrRecorder.record_metric` added kw-only `band`/`ci95_low`/`ci95_high` parameters with default `None`; `_RunAggregator.record_metric` mirrors them. Existing positional + named callers remain valid.