Files
gps-denied-onboard/_docs/03_implementation/batch_89_cycle1_report.md
T
Oleksandr Bezdieniezhnykh 33e683dc0f [AZ-446] CSV reporter: band + ci95 annotations + report.csv emitter
Batch 89 — adds optional `band`, `ci95_low`, `ci95_high` kw-only
parameters to `_NfrRecorder.record_metric` and emits a new per-metric
report.csv artifact (one row per scenario × metric, columns:
scenario_id, metric_name, value, value_band, ci95_low, ci95_high,
ac_id, outcome). Backwards compatible — existing 4-arg callers
unchanged; unbalanced ci95 pair raises ValueError. report.csv is
written once per pytest session from `pytest_sessionfinish` so the
annotation pass runs once per CI invocation regardless of
(fc_adapter, vio_strategy) (AC-3). `regression-baseline.json`
intentionally kept flat to preserve the diff contract used by
regression-detection tooling.

NFT-RES-03 + NFT-PERF-01 scenarios updated to pass real bands and
compute empirical 2.5/97.5-percentile ci95 from their own sample
streams (per-iteration envelope ratios for Monte Carlo,
per-frame latency samples for N-sample latency).

Tests: 1229 e2e/_unit_tests pass (+6 vs. batch 88 for AZ-446
band/CI behavior, value-error on unbalanced ci95, report.csv columns,
explicit-path override, and end-to-end emission via the pytest
plugin). Code review: PASS_WITH_WARNINGS — 1 Low (empirical-CI
semantics, documented inline), 1 Medium carried over from batch 88's
cumulative-review backlog (write_csv_evidence + _resolve_fixture_path
duplication is outside AZ-446 reporting scope).

This commit closes Step 10 Implement Tests for cycle 1 (41 of 41
blackbox-test tasks done, AZ-406..AZ-446). Greenfield auto-chains to
Step 11 Run Tests next.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 18:14:00 +03:00

2.3 KiB
Raw Blame History

Batch Report

Batch: 89 Tasks: AZ-446 (CSV reporter refinements — trend-line + acceptance-band annotations + Monte Carlo CI) Date: 2026-05-17 Cycle: 1 Complexity: 2 points

Task Results

Task Status Files Modified Tests AC Coverage Issues
AZ-446_csv_reporter_refinements Done 1 source (nfr_recorder), 2 scenarios (nft_res_03 + nft_perf_01), 1 unit test (test_nfr_recorder) pass 3/3 F1 Low (CI naming semantics, in-scope), F2 Medium (carry-over from batch 88, not in scope)

AC Test Coverage: All covered (3 of 3 ACs)

Code Review Verdict: PASS_WITH_WARNINGS

See _docs/03_implementation/reviews/batch_89_review.md. 0 Critical / 0 High / 1 Medium (cumulative-review carry-over from batches 8588: write_csv_evidence + _resolve_fixture_path duplication is outside AZ-446 scope — surfaces again at the next cumulative review) / 1 Low (empirical-CI naming semantics, documented in nfr_recorder docstring).

Auto-Fix Attempts: 0

Stuck Agents: None

Test Results

  • e2e/_unit_tests/reporting/test_nfr_recorder.py — 14 tests pass (8 pre-existing + 6 new for AZ-446 band/CI behavior).
  • Full e2e unit-test suite: 1229 passed in 134 s (+6 vs. batch 88).
  • No scenario-level regressions: NFT-RES-03 + NFT-PERF-01 scenarios continue to skip cleanly via sitl_replay_ready / tier2_only gating in the Tier-1 docker harness.

API Change Summary

_NfrRecorder.record_metric (and the underlying _RunAggregator):

record_metric(
    name, value, ac_id=None, *,
    band: str | None = None,
    ci95_low: float | None = None,
    ci95_high: float | None = None,
)
  • All new params kw-only, default None — fully backwards compatible.
  • Unbalanced ci95_*ValueError.

New artifact:

  • <evidence_dir>/report.csv (one row per (scenario, metric)) — columns: scenario_id, metric_name, value, value_band, ci95_low, ci95_high, ac_id, outcome. Emitted once per pytest session by _PluginHooks.pytest_sessionfinish (AC-3).

regression-baseline.json schema is unchanged (flat {metric: numeric}) to preserve the diff contract used by regression-detection tooling.

Next Batch: None — all selected test-implementation tasks are done (Step 10 Implement Tests complete for cycle 1).