mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 17:11:14 +00:00
[AZ-446] CSV reporter: band + ci95 annotations + report.csv emitter
Batch 89 — adds optional `band`, `ci95_low`, `ci95_high` kw-only parameters to `_NfrRecorder.record_metric` and emits a new per-metric report.csv artifact (one row per scenario × metric, columns: scenario_id, metric_name, value, value_band, ci95_low, ci95_high, ac_id, outcome). Backwards compatible — existing 4-arg callers unchanged; unbalanced ci95 pair raises ValueError. report.csv is written once per pytest session from `pytest_sessionfinish` so the annotation pass runs once per CI invocation regardless of (fc_adapter, vio_strategy) (AC-3). `regression-baseline.json` intentionally kept flat to preserve the diff contract used by regression-detection tooling. NFT-RES-03 + NFT-PERF-01 scenarios updated to pass real bands and compute empirical 2.5/97.5-percentile ci95 from their own sample streams (per-iteration envelope ratios for Monte Carlo, per-frame latency samples for N-sample latency). Tests: 1229 e2e/_unit_tests pass (+6 vs. batch 88 for AZ-446 band/CI behavior, value-error on unbalanced ci95, report.csv columns, explicit-path override, and end-to-end emission via the pytest plugin). Code review: PASS_WITH_WARNINGS — 1 Low (empirical-CI semantics, documented inline), 1 Medium carried over from batch 88's cumulative-review backlog (write_csv_evidence + _resolve_fixture_path duplication is outside AZ-446 reporting scope). This commit closes Step 10 Implement Tests for cycle 1 (41 of 41 blackbox-test tasks done, AZ-406..AZ-446). Greenfield auto-chains to Step 11 Run Tests next. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -1,51 +0,0 @@
|
||||
# CSV reporter refinements + per-NFR machine-readable trend lines
|
||||
|
||||
**Task**: AZ-446_csv_reporter_refinements
|
||||
**Name**: Trend-line + acceptance-band annotations + Monte Carlo confidence intervals
|
||||
**Description**: Add trend-line + acceptance-band annotations to the CSV reporter (so each numeric metric carries the AC's threshold inline), and emit Monte Carlo confidence intervals where applicable (NFT-RES-03, NFT-PERF-01).
|
||||
**Complexity**: 2 points
|
||||
**Dependencies**: AZ-406, AZ-445
|
||||
**Component**: Blackbox Tests / Reporting (epic AZ-262)
|
||||
**Tracker**: AZ-446
|
||||
**Epic**: AZ-262 (E-BBT)
|
||||
|
||||
## Problem
|
||||
|
||||
A bare numeric metric without its acceptance band is hard to read. Adding the band inline (e.g., `p95_latency_ms = 387 (≤400 budget)`) makes the report immediately legible. Monte Carlo runs additionally need confidence intervals so a single iteration's outlier doesn't drive false-positive failures.
|
||||
|
||||
## Outcome
|
||||
|
||||
- `report.csv` columns extended: every numeric column gets a paired `_band` column (e.g., `p95_latency_ms`, `p95_latency_ms_band`).
|
||||
- Monte Carlo CI: NFT-RES-03 + NFT-PERF-01 emit a `_ci95_low` + `_ci95_high` paired column.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- Band annotation per numeric column (read from each scenario's task spec).
|
||||
- CI-95 columns for the Monte Carlo + N-sample scenarios.
|
||||
|
||||
### Excluded
|
||||
- Visual report rendering (HTML, etc.) — out of scope.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: band annotation**
|
||||
Given any numeric metric column in `report.csv`
|
||||
Then a paired `_band` column exists with the AC threshold text (e.g., `≤400 ms`).
|
||||
|
||||
**AC-2: CI-95 for Monte Carlo**
|
||||
Given NFT-RES-03 emits 100 iterations
|
||||
Then `report.csv` contains `_ci95_low` + `_ci95_high` paired columns for each metric.
|
||||
|
||||
**AC-3: parameterization**
|
||||
Given conftest parameterization
|
||||
Then this annotation pass runs once per CI invocation regardless of `(fc_adapter, vio_strategy)`.
|
||||
|
||||
## System Under Test Boundary
|
||||
|
||||
Reporting-only.
|
||||
|
||||
## Document Dependencies
|
||||
|
||||
- `_docs/02_document/tests/test-data.md` § Reporting & Evidence
|
||||
- Each scenario task's AC numeric thresholds (read at report time)
|
||||
Reference in New Issue
Block a user