[AZ-440] [AZ-441] [AZ-442] [AZ-443] NFT-LIM-01/02/03+05/04 blackbox scenarios

Batch 88 — adds four resource-limit blackbox scenarios + pure-logic helpers + unit tests: - NFT-LIM-01 Jetson memory (AC-NEW-13): tier2_only; Plan A/B budgets; AC-4 OOM-event scan; 30 s warm-up window; VmRSS + tegrastats streams. - NFT-LIM-02 FDR size (AC-7.3): 30 min → 8 h linear extrapolation against 50 GiB; ±60 s replay-window slack for AC-1. - NFT-LIM-03+05 storage (AC-7.4 + AC-NEW-12 + RESTRICT-STORAGE): aggregate ≤ 100 GiB across tile-cache + tile-cache-write + fdr-output; thumbnail-log < 1 GiB strict 8 h-extrapolated. - NFT-LIM-04 thermal (AC-NEW-5 PARTIAL): tier2_only; CPU/SoC p99 ≤ T_throttle − 5 °C; throttle-event scan; PARTIAL annotation written to traceability-status.json. Thresholds fixture lives at e2e/fixtures/jetson/thermal-thresholds.json (moved from the task spec's suggested tests/fixtures/ path so the file stays inside the blackbox_tests Owns: e2e/** envelope). All four helpers are public-boundary-only (no src/gps_denied_onboard imports). Scenarios skip cleanly in the Tier-1 docker harness pending AZ-595 (SITL replay builder) for the four shared fixture inputs and AZ-444 (Tier-2 Jetson runner) for the tier2_only scenarios. Code review: PASS_WITH_WARNINGS (0/0/2/1). Both Mediums are carried-over write_csv_evidence + _resolve_fixture_path duplication, deferred to AZ-446 (batch 89). Low is the self-resolved AZ-443 fixture ownership drift documented in the review. Tests: 1223 e2e/_unit_tests passing (+1 vs. batch 87 from the new directory-layout entry); 24 resource_limit scenarios collect and skip cleanly under runner/pytest.ini. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 19:01:14 +00:00 · 2026-05-17 18:01:55 +03:00
parent d1e30f818f
commit 6e4a575221
22 changed files with 2785 additions and 4 deletions
@@ -1,77 +0,0 @@
-# NFT-LIM-01 — Jetson memory budget
-
-**Task**: AZ-440_nft_lim_01_jetson_memory
-**Name**: Steady-state memory ≤ 4.5 GB (Plan A) / ≤ 6.0 GB (Plan B); peak ≤ 5.0 GB / 6.5 GB; no OOM (AC-NEW-13)
-**Description**: Implement NFT-LIM-01 — Tier-2 ONLY; 5 min Derkachi replay + 30 s warm-up; sample memory at 1 Hz from `/proc/<pid>/status` AND `tegrastats`; assert steady-state and peak budgets per Plan A / Plan B; no OOM kills observed.
-**Complexity**: 3 points
-**Dependencies**: AZ-406, AZ-407, AZ-444
-**Component**: Blackbox Tests / Resource Limit (epic AZ-262)
-**Tracker**: AZ-440
-**Epic**: AZ-262 (E-BBT)
-
-## Problem
-
-Jetson Orin Nano Super has 8 GB RAM total; the SUT must operate within Plan A (4.5 GB) or Plan B (6.0 GB) budgets to leave headroom for OS + suite. AC-NEW-13 prescribes the budgets and Plan-A/Plan-B switching rules.
-
-## Outcome
-
- pytest scenario at `e2e/tests/resource_limit/test_nft_lim_01_jetson_memory.py`. Tier-2 ONLY.
- 5 min Derkachi replay + 30 s warm-up.
- Per-second memory sample from (a) `/proc/<pid>/status` `VmRSS`, (b) `tegrastats` (system-level memory).
- Compute steady-state (post-warm-up p50) and peak (post-warm-up max).
- Assert `steady_state ≤ 4.5 GB` (Plan A default) AND `peak ≤ 5.0 GB`. Plan B (6.0 / 6.5 GB) gated behind `MEMORY_PLAN=B` flag.
- Assert no OOM kills in `dmesg` during the run.
-
-## Scope
-
-### Included
- `/proc/<pid>/status` + `tegrastats` sampling.
- Steady-state + peak computation.
- OOM detection in `dmesg`.
-
-### Excluded
- FDR size budget — owned by NFT-LIM-02 (AZ-441).
- Tier-1 memory measurement — irrelevant (Docker on x86 has different budgets).
-
-## Acceptance Criteria
-
-**AC-1: tier guard**
-Given `tier == tier1-docker`
-Then the test SKIPs.
-
-**AC-2: steady-state budget (Plan A)**
-Given the post-warm-up samples
-Then `p50(VmRSS) ≤ 4.5 GB` AND `p50(tegrastats system memory) ≤ 4.5 GB`.
-
-**AC-3: peak budget (Plan A)**
-Given the same samples
-Then `max(VmRSS) ≤ 5.0 GB`.
-
-**AC-4: no OOM kills**
-Given `dmesg --since "<run_start>"`
-Then no entries match `oom-killer` or `Killed process .*gps-denied-onboard`.
-
-**AC-5: Plan B gated**
-Given `MEMORY_PLAN=B` env flag
-Then the budgets relax to `steady ≤ 6.0 GB` AND `peak ≤ 6.5 GB`.
-
-**AC-6: parameterization**
-Given conftest parameterization
-Then the scenario runs per `(fc_adapter, vio_strategy)`.
-
-## System Under Test Boundary
-
-End-to-end on real hardware through public boundaries.
-
- **Allowed**: `/proc`, `tegrastats` (public OS / NVIDIA telemetry), `dmesg`.
- **Forbidden**: instrumenting SUT internal allocators.
-
-## Constraints
-
- Tier-2 only.
- Plan A / Plan B switching is per the SUT's documented config; the test does NOT trigger the switch — it observes and reports the active plan.
-
-## Document Dependencies
-
- `_docs/02_document/tests/resource-limit-tests.md` § NFT-LIM-01
- `_docs/02_document/tests/test-data.md` § Resource Limits (NFT-LIM-01 row)
@@ -1,57 +0,0 @@
-# NFT-LIM-02 — FDR size budget
-
-**Task**: AZ-441_nft_lim_02_fdr_size
-**Name**: 8 h-extrapolated FDR size ≤ 50 GB (AC-7.3)
-**Description**: Implement NFT-LIM-02 — replay 30 min Derkachi at typical fixed rates; measure FDR-archive growth (`du -sh fdr-output`); extrapolate to 8 h linearly; assert `8h_extrapolated_size ≤ 50 GB`.
-**Complexity**: 2 points
-**Dependencies**: AZ-406, AZ-407
-**Component**: Blackbox Tests / Resource Limit (epic AZ-262)
-**Tracker**: AZ-441
-**Epic**: AZ-262 (E-BBT)
-
-## Problem
-
-FDR size budget (AC-7.3) protects the on-board storage from being filled mid-flight. A 30 min run extrapolated to 8 h is the canonical measurement.
-
-## Outcome
-
- pytest scenario at `e2e/tests/resource_limit/test_nft_lim_02_fdr_size.py`. Tier-1 OR Tier-2.
- 30 min Derkachi replay (loop the 8 min flight ~4×); per-minute `du -sh fdr-output` sampling.
- Linear extrapolation to 8 h: `extrapolated_size = (size_at_30min / 30) × (8 × 60)`.
- Assert `extrapolated_size ≤ 50 GB`.
-
-## Scope
-
-### Included
- 30 min replay (looped 8 min × 4).
- Per-minute size sampling.
- Linear extrapolation.
-
-### Excluded
- Storage budget for tile cache + tiles → owned by NFT-LIM-03 (AZ-442).
-
-## Acceptance Criteria
-
-**AC-1: 30 min replay**
-Given the test runs
-Then the runner loops Derkachi for 30 min wall-clock.
-
-**AC-2: extrapolation budget**
-Given per-minute samples
-Then `(size_at_30min / 30) × 480 ≤ 50 GB`.
-
-**AC-3: parameterization**
-Given conftest parameterization
-Then the scenario runs per `(fc_adapter, vio_strategy)`.
-
-## System Under Test Boundary
-
-End-to-end through public boundaries.
-
- **Allowed**: `du -sh` of mounted volumes.
- **Forbidden**: importing FDR writer state.
-
-## Document Dependencies
-
- `_docs/02_document/tests/resource-limit-tests.md` § NFT-LIM-02
- `_docs/02_document/tests/test-data.md` § Resource Limits (NFT-LIM-02 row)
@@ -1,55 +0,0 @@
-# NFT-LIM-03 + NFT-LIM-05 — Aggregate storage budget + thumbnail-log budget
-
-**Task**: AZ-442_nft_lim_03_05_storage_budget
-**Name**: Aggregate on-disk storage + thumbnail-log budget (AC-7.4 / AC-NEW-12 / RESTRICT-STORAGE)
-**Description**: Combined coverage for two storage-budget scenarios that share the same volume measurement: NFT-LIM-03 (aggregate `tile-cache + tile-cache-write + fdr-output ≤ 100 GB`) and NFT-LIM-05 (thumbnail-log size component < 1 GB / 8 h).
-**Complexity**: 2 points
-**Dependencies**: AZ-406, AZ-407
-**Component**: Blackbox Tests / Resource Limit (epic AZ-262)
-**Tracker**: AZ-442
-**Epic**: AZ-262 (E-BBT)
-
-## Problem
-
-The aggregate storage budget bounds how much disk the SUT may consume on the companion. Two related scenarios share this measurement and are combined to avoid duplication.
-
-## Outcome
-
- pytest scenario at `e2e/tests/resource_limit/test_nft_lim_03_05_storage_budget.py`. Tier-1 OR Tier-2.
- 30 min Derkachi replay; per-minute `du -sh` of `tile-cache`, `tile-cache-write`, `fdr-output`, and the thumbnail-log subdirectory.
- NFT-LIM-03: assert `aggregate(tile-cache + tile-cache-write + fdr-output) ≤ 100 GB`.
- NFT-LIM-05: extrapolate thumbnail-log subdirectory to 8 h; assert `extrapolated_thumbnail_log < 1 GB`.
-
-## Scope
-
-### Included
- 30 min replay (shareable with NFT-LIM-02 if combined in CI orchestration).
- Per-volume size sampling.
- Both budget assertions.
-
-### Excluded
- FDR-only size — owned by NFT-LIM-02.
- Mid-flight tile generation rate — owned by FT-P-17 (AZ-422).
-
-## Acceptance Criteria
-
-**AC-1: aggregate budget**
-Given a 30 min replay
-Then `du -sh tile-cache + tile-cache-write + fdr-output ≤ 100 GB` at the end of the run.
-
-**AC-2: thumbnail-log 8 h budget**
-Given the per-minute samples of the thumbnail-log subdirectory
-Then `(size_at_30min_thumb / 30) × 480 < 1 GB`.
-
-**AC-3: parameterization**
-Given conftest parameterization
-Then the scenario runs per `(fc_adapter, vio_strategy)`.
-
-## System Under Test Boundary
-
-Same as NFT-LIM-02.
-
-## Document Dependencies
-
- `_docs/02_document/tests/resource-limit-tests.md` § NFT-LIM-03, § NFT-LIM-05
- `_docs/02_document/tests/test-data.md` § Resource Limits
@@ -1,66 +0,0 @@
-# NFT-LIM-04 — Thermal envelope on Jetson
-
-**Task**: AZ-443_nft_lim_04_thermal
-**Name**: Sustained thermal headroom + AC-NEW-5 PARTIAL acceptance (AC-NEW-5)
-**Description**: Implement NFT-LIM-04 — Tier-2 ONLY; sustained 30 min Derkachi loop at workstation ambient; record CPU/GPU/SoC temperatures via `tegrastats`; assert no thermal throttling kicks in. Mark as PARTIAL coverage of AC-NEW-5 (the +50 °C chamber portion is a separate release-gate scenario).
-**Complexity**: 2 points
-**Dependencies**: AZ-406, AZ-407, AZ-444
-**Component**: Blackbox Tests / Resource Limit (epic AZ-262)
-**Tracker**: AZ-443
-**Epic**: AZ-262 (E-BBT)
-
-## Problem
-
-Thermal behavior in workstation ambient is a partial coverage of AC-NEW-5 — it cannot prove the +50 °C envelope but it can flag obvious thermal regressions.
-
-## Outcome
-
- pytest scenario at `e2e/tests/resource_limit/test_nft_lim_04_thermal.py`. Tier-2 ONLY.
- 30 min Derkachi loop; per-second `tegrastats` capture; assert no thermal-throttling event in `dmesg` AND p99 `cpu_temp ≤ T_throttle - 5 °C` (5 °C headroom).
- Annotate `traceability-matrix.md` AC-NEW-5 status as PARTIAL (chamber required for full).
-
-## Scope
-
-### Included
- 30 min loop.
- `tegrastats` + `dmesg` capture.
- PARTIAL annotation in evidence bundle.
-
-### Excluded
- +50 °C chamber portion — owned by a separate release-gate scenario, not in this CI scope.
-
-## Acceptance Criteria
-
-**AC-1: tier guard**
-Given `tier == tier1-docker`
-Then the test SKIPs.
-
-**AC-2: no thermal throttle**
-Given the 30 min loop
-Then `dmesg --since "<run_start>"` shows no entries matching `thermal_throttle` / `tegra_thermal_zone`.
-
-**AC-3: 5 °C headroom**
-Given the per-second `tegrastats` samples
-Then `p99(cpu_temp) ≤ T_throttle - 5 °C` AND `p99(soc_temp) ≤ T_throttle - 5 °C`. T_throttle is the hardware-documented value (97 °C for CPU, 95 °C for SoC on Orin Nano per nVidia docs; sourced from a fixture file at runtime).
-
-**AC-4: PARTIAL annotation**
-Given the test completes
-Then the evidence bundle includes a `traceability-status.json` entry `"AC-NEW-5": "PARTIAL — chamber required for full"`.
-
-**AC-5: parameterization**
-Given conftest parameterization
-Then the scenario runs per `(fc_adapter, vio_strategy)`.
-
-## System Under Test Boundary
-
-Same as NFT-LIM-01.
-
-## Constraints
-
- Tier-2 only.
- T_throttle is read from a fixture file (`tests/fixtures/jetson-thermal-thresholds.json`) so future Jetson hardware updates require only a fixture bump.
-
-## Document Dependencies
-
- `_docs/02_document/tests/resource-limit-tests.md` § NFT-LIM-04
- `_docs/02_document/tests/traceability-matrix.md` § AC-NEW-5 (PARTIAL annotation)