[AZ-440] [AZ-441] [AZ-442] [AZ-443] NFT-LIM-01/02/03+05/04 blackbox scenarios

Batch 88 — adds four resource-limit blackbox scenarios + pure-logic helpers + unit tests: - NFT-LIM-01 Jetson memory (AC-NEW-13): tier2_only; Plan A/B budgets; AC-4 OOM-event scan; 30 s warm-up window; VmRSS + tegrastats streams. - NFT-LIM-02 FDR size (AC-7.3): 30 min → 8 h linear extrapolation against 50 GiB; ±60 s replay-window slack for AC-1. - NFT-LIM-03+05 storage (AC-7.4 + AC-NEW-12 + RESTRICT-STORAGE): aggregate ≤ 100 GiB across tile-cache + tile-cache-write + fdr-output; thumbnail-log < 1 GiB strict 8 h-extrapolated. - NFT-LIM-04 thermal (AC-NEW-5 PARTIAL): tier2_only; CPU/SoC p99 ≤ T_throttle − 5 °C; throttle-event scan; PARTIAL annotation written to traceability-status.json. Thresholds fixture lives at e2e/fixtures/jetson/thermal-thresholds.json (moved from the task spec's suggested tests/fixtures/ path so the file stays inside the blackbox_tests Owns: e2e/** envelope). All four helpers are public-boundary-only (no src/gps_denied_onboard imports). Scenarios skip cleanly in the Tier-1 docker harness pending AZ-595 (SITL replay builder) for the four shared fixture inputs and AZ-444 (Tier-2 Jetson runner) for the tier2_only scenarios. Code review: PASS_WITH_WARNINGS (0/0/2/1). Both Mediums are carried-over write_csv_evidence + _resolve_fixture_path duplication, deferred to AZ-446 (batch 89). Low is the self-resolved AZ-443 fixture ownership drift documented in the review. Tests: 1223 e2e/_unit_tests passing (+1 vs. batch 87 from the new directory-layout entry); 24 resource_limit scenarios collect and skip cleanly under runner/pytest.ini. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 18:51:15 +00:00 · 2026-05-17 18:01:55 +03:00
parent d1e30f818f
commit 6e4a575221
22 changed files with 2785 additions and 4 deletions
@@ -0,0 +1,77 @@
+# NFT-LIM-01 — Jetson memory budget
+
+**Task**: AZ-440_nft_lim_01_jetson_memory
+**Name**: Steady-state memory ≤ 4.5 GB (Plan A) / ≤ 6.0 GB (Plan B); peak ≤ 5.0 GB / 6.5 GB; no OOM (AC-NEW-13)
+**Description**: Implement NFT-LIM-01 — Tier-2 ONLY; 5 min Derkachi replay + 30 s warm-up; sample memory at 1 Hz from `/proc/<pid>/status` AND `tegrastats`; assert steady-state and peak budgets per Plan A / Plan B; no OOM kills observed.
+**Complexity**: 3 points
+**Dependencies**: AZ-406, AZ-407, AZ-444
+**Component**: Blackbox Tests / Resource Limit (epic AZ-262)
+**Tracker**: AZ-440
+**Epic**: AZ-262 (E-BBT)
+
+## Problem
+
+Jetson Orin Nano Super has 8 GB RAM total; the SUT must operate within Plan A (4.5 GB) or Plan B (6.0 GB) budgets to leave headroom for OS + suite. AC-NEW-13 prescribes the budgets and Plan-A/Plan-B switching rules.
+
+## Outcome
+
+- pytest scenario at `e2e/tests/resource_limit/test_nft_lim_01_jetson_memory.py`. Tier-2 ONLY.
+- 5 min Derkachi replay + 30 s warm-up.
+- Per-second memory sample from (a) `/proc/<pid>/status` `VmRSS`, (b) `tegrastats` (system-level memory).
+- Compute steady-state (post-warm-up p50) and peak (post-warm-up max).
+- Assert `steady_state ≤ 4.5 GB` (Plan A default) AND `peak ≤ 5.0 GB`. Plan B (6.0 / 6.5 GB) gated behind `MEMORY_PLAN=B` flag.
+- Assert no OOM kills in `dmesg` during the run.
+
+## Scope
+
+### Included
+- `/proc/<pid>/status` + `tegrastats` sampling.
+- Steady-state + peak computation.
+- OOM detection in `dmesg`.
+
+### Excluded
+- FDR size budget — owned by NFT-LIM-02 (AZ-441).
+- Tier-1 memory measurement — irrelevant (Docker on x86 has different budgets).
+
+## Acceptance Criteria
+
+**AC-1: tier guard**
+Given `tier == tier1-docker`
+Then the test SKIPs.
+
+**AC-2: steady-state budget (Plan A)**
+Given the post-warm-up samples
+Then `p50(VmRSS) ≤ 4.5 GB` AND `p50(tegrastats system memory) ≤ 4.5 GB`.
+
+**AC-3: peak budget (Plan A)**
+Given the same samples
+Then `max(VmRSS) ≤ 5.0 GB`.
+
+**AC-4: no OOM kills**
+Given `dmesg --since "<run_start>"`
+Then no entries match `oom-killer` or `Killed process .*gps-denied-onboard`.
+
+**AC-5: Plan B gated**
+Given `MEMORY_PLAN=B` env flag
+Then the budgets relax to `steady ≤ 6.0 GB` AND `peak ≤ 6.5 GB`.
+
+**AC-6: parameterization**
+Given conftest parameterization
+Then the scenario runs per `(fc_adapter, vio_strategy)`.
+
+## System Under Test Boundary
+
+End-to-end on real hardware through public boundaries.
+
+- **Allowed**: `/proc`, `tegrastats` (public OS / NVIDIA telemetry), `dmesg`.
+- **Forbidden**: instrumenting SUT internal allocators.
+
+## Constraints
+
+- Tier-2 only.
+- Plan A / Plan B switching is per the SUT's documented config; the test does NOT trigger the switch — it observes and reports the active plan.
+
+## Document Dependencies
+
+- `_docs/02_document/tests/resource-limit-tests.md` § NFT-LIM-01
+- `_docs/02_document/tests/test-data.md` § Resource Limits (NFT-LIM-01 row)
@@ -0,0 +1,57 @@
+# NFT-LIM-02 — FDR size budget
+
+**Task**: AZ-441_nft_lim_02_fdr_size
+**Name**: 8 h-extrapolated FDR size ≤ 50 GB (AC-7.3)
+**Description**: Implement NFT-LIM-02 — replay 30 min Derkachi at typical fixed rates; measure FDR-archive growth (`du -sh fdr-output`); extrapolate to 8 h linearly; assert `8h_extrapolated_size ≤ 50 GB`.
+**Complexity**: 2 points
+**Dependencies**: AZ-406, AZ-407
+**Component**: Blackbox Tests / Resource Limit (epic AZ-262)
+**Tracker**: AZ-441
+**Epic**: AZ-262 (E-BBT)
+
+## Problem
+
+FDR size budget (AC-7.3) protects the on-board storage from being filled mid-flight. A 30 min run extrapolated to 8 h is the canonical measurement.
+
+## Outcome
+
+- pytest scenario at `e2e/tests/resource_limit/test_nft_lim_02_fdr_size.py`. Tier-1 OR Tier-2.
+- 30 min Derkachi replay (loop the 8 min flight ~4×); per-minute `du -sh fdr-output` sampling.
+- Linear extrapolation to 8 h: `extrapolated_size = (size_at_30min / 30) × (8 × 60)`.
+- Assert `extrapolated_size ≤ 50 GB`.
+
+## Scope
+
+### Included
+- 30 min replay (looped 8 min × 4).
+- Per-minute size sampling.
+- Linear extrapolation.
+
+### Excluded
+- Storage budget for tile cache + tiles → owned by NFT-LIM-03 (AZ-442).
+
+## Acceptance Criteria
+
+**AC-1: 30 min replay**
+Given the test runs
+Then the runner loops Derkachi for 30 min wall-clock.
+
+**AC-2: extrapolation budget**
+Given per-minute samples
+Then `(size_at_30min / 30) × 480 ≤ 50 GB`.
+
+**AC-3: parameterization**
+Given conftest parameterization
+Then the scenario runs per `(fc_adapter, vio_strategy)`.
+
+## System Under Test Boundary
+
+End-to-end through public boundaries.
+
+- **Allowed**: `du -sh` of mounted volumes.
+- **Forbidden**: importing FDR writer state.
+
+## Document Dependencies
+
+- `_docs/02_document/tests/resource-limit-tests.md` § NFT-LIM-02
+- `_docs/02_document/tests/test-data.md` § Resource Limits (NFT-LIM-02 row)
@@ -0,0 +1,55 @@
+# NFT-LIM-03 + NFT-LIM-05 — Aggregate storage budget + thumbnail-log budget
+
+**Task**: AZ-442_nft_lim_03_05_storage_budget
+**Name**: Aggregate on-disk storage + thumbnail-log budget (AC-7.4 / AC-NEW-12 / RESTRICT-STORAGE)
+**Description**: Combined coverage for two storage-budget scenarios that share the same volume measurement: NFT-LIM-03 (aggregate `tile-cache + tile-cache-write + fdr-output ≤ 100 GB`) and NFT-LIM-05 (thumbnail-log size component < 1 GB / 8 h).
+**Complexity**: 2 points
+**Dependencies**: AZ-406, AZ-407
+**Component**: Blackbox Tests / Resource Limit (epic AZ-262)
+**Tracker**: AZ-442
+**Epic**: AZ-262 (E-BBT)
+
+## Problem
+
+The aggregate storage budget bounds how much disk the SUT may consume on the companion. Two related scenarios share this measurement and are combined to avoid duplication.
+
+## Outcome
+
+- pytest scenario at `e2e/tests/resource_limit/test_nft_lim_03_05_storage_budget.py`. Tier-1 OR Tier-2.
+- 30 min Derkachi replay; per-minute `du -sh` of `tile-cache`, `tile-cache-write`, `fdr-output`, and the thumbnail-log subdirectory.
+- NFT-LIM-03: assert `aggregate(tile-cache + tile-cache-write + fdr-output) ≤ 100 GB`.
+- NFT-LIM-05: extrapolate thumbnail-log subdirectory to 8 h; assert `extrapolated_thumbnail_log < 1 GB`.
+
+## Scope
+
+### Included
+- 30 min replay (shareable with NFT-LIM-02 if combined in CI orchestration).
+- Per-volume size sampling.
+- Both budget assertions.
+
+### Excluded
+- FDR-only size — owned by NFT-LIM-02.
+- Mid-flight tile generation rate — owned by FT-P-17 (AZ-422).
+
+## Acceptance Criteria
+
+**AC-1: aggregate budget**
+Given a 30 min replay
+Then `du -sh tile-cache + tile-cache-write + fdr-output ≤ 100 GB` at the end of the run.
+
+**AC-2: thumbnail-log 8 h budget**
+Given the per-minute samples of the thumbnail-log subdirectory
+Then `(size_at_30min_thumb / 30) × 480 < 1 GB`.
+
+**AC-3: parameterization**
+Given conftest parameterization
+Then the scenario runs per `(fc_adapter, vio_strategy)`.
+
+## System Under Test Boundary
+
+Same as NFT-LIM-02.
+
+## Document Dependencies
+
+- `_docs/02_document/tests/resource-limit-tests.md` § NFT-LIM-03, § NFT-LIM-05
+- `_docs/02_document/tests/test-data.md` § Resource Limits
@@ -0,0 +1,66 @@
+# NFT-LIM-04 — Thermal envelope on Jetson
+
+**Task**: AZ-443_nft_lim_04_thermal
+**Name**: Sustained thermal headroom + AC-NEW-5 PARTIAL acceptance (AC-NEW-5)
+**Description**: Implement NFT-LIM-04 — Tier-2 ONLY; sustained 30 min Derkachi loop at workstation ambient; record CPU/GPU/SoC temperatures via `tegrastats`; assert no thermal throttling kicks in. Mark as PARTIAL coverage of AC-NEW-5 (the +50 °C chamber portion is a separate release-gate scenario).
+**Complexity**: 2 points
+**Dependencies**: AZ-406, AZ-407, AZ-444
+**Component**: Blackbox Tests / Resource Limit (epic AZ-262)
+**Tracker**: AZ-443
+**Epic**: AZ-262 (E-BBT)
+
+## Problem
+
+Thermal behavior in workstation ambient is a partial coverage of AC-NEW-5 — it cannot prove the +50 °C envelope but it can flag obvious thermal regressions.
+
+## Outcome
+
+- pytest scenario at `e2e/tests/resource_limit/test_nft_lim_04_thermal.py`. Tier-2 ONLY.
+- 30 min Derkachi loop; per-second `tegrastats` capture; assert no thermal-throttling event in `dmesg` AND p99 `cpu_temp ≤ T_throttle - 5 °C` (5 °C headroom).
+- Annotate `traceability-matrix.md` AC-NEW-5 status as PARTIAL (chamber required for full).
+
+## Scope
+
+### Included
+- 30 min loop.
+- `tegrastats` + `dmesg` capture.
+- PARTIAL annotation in evidence bundle.
+
+### Excluded
+- +50 °C chamber portion — owned by a separate release-gate scenario, not in this CI scope.
+
+## Acceptance Criteria
+
+**AC-1: tier guard**
+Given `tier == tier1-docker`
+Then the test SKIPs.
+
+**AC-2: no thermal throttle**
+Given the 30 min loop
+Then `dmesg --since "<run_start>"` shows no entries matching `thermal_throttle` / `tegra_thermal_zone`.
+
+**AC-3: 5 °C headroom**
+Given the per-second `tegrastats` samples
+Then `p99(cpu_temp) ≤ T_throttle - 5 °C` AND `p99(soc_temp) ≤ T_throttle - 5 °C`. T_throttle is the hardware-documented value (97 °C for CPU, 95 °C for SoC on Orin Nano per nVidia docs; sourced from a fixture file at runtime).
+
+**AC-4: PARTIAL annotation**
+Given the test completes
+Then the evidence bundle includes a `traceability-status.json` entry `"AC-NEW-5": "PARTIAL — chamber required for full"`.
+
+**AC-5: parameterization**
+Given conftest parameterization
+Then the scenario runs per `(fc_adapter, vio_strategy)`.
+
+## System Under Test Boundary
+
+Same as NFT-LIM-01.
+
+## Constraints
+
+- Tier-2 only.
+- T_throttle is read from a fixture file (`e2e/fixtures/jetson/thermal-thresholds.json`) so future Jetson hardware updates require only a fixture bump. (Implementation relocated from the task spec's original `tests/fixtures/` suggestion to `e2e/fixtures/` so the fixture lives inside the `blackbox_tests` `Owns: e2e/**` envelope per `_docs/02_document/module-layout.md`.)
+
+## Document Dependencies
+
+- `_docs/02_document/tests/resource-limit-tests.md` § NFT-LIM-04
+- `_docs/02_document/tests/traceability-matrix.md` § AC-NEW-5 (PARTIAL annotation)