mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 07:21:13 +00:00
[AZ-446] CSV reporter: band + ci95 annotations + report.csv emitter
Batch 89 — adds optional `band`, `ci95_low`, `ci95_high` kw-only parameters to `_NfrRecorder.record_metric` and emits a new per-metric report.csv artifact (one row per scenario × metric, columns: scenario_id, metric_name, value, value_band, ci95_low, ci95_high, ac_id, outcome). Backwards compatible — existing 4-arg callers unchanged; unbalanced ci95 pair raises ValueError. report.csv is written once per pytest session from `pytest_sessionfinish` so the annotation pass runs once per CI invocation regardless of (fc_adapter, vio_strategy) (AC-3). `regression-baseline.json` intentionally kept flat to preserve the diff contract used by regression-detection tooling. NFT-RES-03 + NFT-PERF-01 scenarios updated to pass real bands and compute empirical 2.5/97.5-percentile ci95 from their own sample streams (per-iteration envelope ratios for Monte Carlo, per-frame latency samples for N-sample latency). Tests: 1229 e2e/_unit_tests pass (+6 vs. batch 88 for AZ-446 band/CI behavior, value-error on unbalanced ci95, report.csv columns, explicit-path override, and end-to-end emission via the pytest plugin). Code review: PASS_WITH_WARNINGS — 1 Low (empirical-CI semantics, documented inline), 1 Medium carried over from batch 88's cumulative-review backlog (write_csv_evidence + _resolve_fixture_path duplication is outside AZ-446 reporting scope). This commit closes Step 10 Implement Tests for cycle 1 (41 of 41 blackbox-test tasks done, AZ-406..AZ-446). Greenfield auto-chains to Step 11 Run Tests next. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,58 @@
|
||||
# Batch Report
|
||||
|
||||
**Batch**: 89
|
||||
**Tasks**: AZ-446 (CSV reporter refinements — trend-line + acceptance-band annotations + Monte Carlo CI)
|
||||
**Date**: 2026-05-17
|
||||
**Cycle**: 1
|
||||
**Complexity**: 2 points
|
||||
|
||||
## Task Results
|
||||
|
||||
| Task | Status | Files Modified | Tests | AC Coverage | Issues |
|
||||
|------|--------|----------------|-------|-------------|--------|
|
||||
| AZ-446_csv_reporter_refinements | Done | 1 source (nfr_recorder), 2 scenarios (nft_res_03 + nft_perf_01), 1 unit test (test_nfr_recorder) | pass | 3/3 | F1 Low (CI naming semantics, in-scope), F2 Medium (carry-over from batch 88, not in scope) |
|
||||
|
||||
## AC Test Coverage: All covered (3 of 3 ACs)
|
||||
|
||||
## Code Review Verdict: PASS_WITH_WARNINGS
|
||||
|
||||
See `_docs/03_implementation/reviews/batch_89_review.md`. 0 Critical / 0 High / 1 Medium (cumulative-review carry-over from batches 85–88: `write_csv_evidence` + `_resolve_fixture_path` duplication is outside AZ-446 scope — surfaces again at the next cumulative review) / 1 Low (empirical-CI naming semantics, documented in nfr_recorder docstring).
|
||||
|
||||
## Auto-Fix Attempts: 0
|
||||
|
||||
## Stuck Agents: None
|
||||
|
||||
## Test Results
|
||||
|
||||
- `e2e/_unit_tests/reporting/test_nfr_recorder.py` — 14 tests pass (8 pre-existing + 6 new for AZ-446 band/CI behavior).
|
||||
- Full e2e unit-test suite: **1229 passed in 134 s** (+6 vs. batch 88).
|
||||
- No scenario-level regressions: NFT-RES-03 + NFT-PERF-01 scenarios continue to skip cleanly via `sitl_replay_ready` / `tier2_only` gating in the Tier-1 docker harness.
|
||||
|
||||
## API Change Summary
|
||||
|
||||
`_NfrRecorder.record_metric` (and the underlying `_RunAggregator`):
|
||||
|
||||
```python
|
||||
record_metric(
|
||||
name, value, ac_id=None, *,
|
||||
band: str | None = None,
|
||||
ci95_low: float | None = None,
|
||||
ci95_high: float | None = None,
|
||||
)
|
||||
```
|
||||
|
||||
- All new params kw-only, default `None` — fully backwards compatible.
|
||||
- Unbalanced `ci95_*` → `ValueError`.
|
||||
|
||||
New artifact:
|
||||
|
||||
- `<evidence_dir>/report.csv` (one row per (scenario, metric)) —
|
||||
columns: `scenario_id, metric_name, value, value_band, ci95_low,
|
||||
ci95_high, ac_id, outcome`. Emitted once per pytest session by
|
||||
`_PluginHooks.pytest_sessionfinish` (AC-3).
|
||||
|
||||
`regression-baseline.json` schema is unchanged (flat `{metric:
|
||||
numeric}`) to preserve the diff contract used by regression-detection
|
||||
tooling.
|
||||
|
||||
## Next Batch: None — all selected test-implementation tasks are done (Step 10 Implement Tests complete for cycle 1).
|
||||
Reference in New Issue
Block a user