[AZ-440] [AZ-441] [AZ-442] [AZ-443] NFT-LIM-01/02/03+05/04 blackbox scenarios

Batch 88 — adds four resource-limit blackbox scenarios + pure-logic
helpers + unit tests:

- NFT-LIM-01 Jetson memory (AC-NEW-13): tier2_only; Plan A/B budgets;
  AC-4 OOM-event scan; 30 s warm-up window; VmRSS + tegrastats streams.
- NFT-LIM-02 FDR size (AC-7.3): 30 min → 8 h linear extrapolation
  against 50 GiB; ±60 s replay-window slack for AC-1.
- NFT-LIM-03+05 storage (AC-7.4 + AC-NEW-12 + RESTRICT-STORAGE):
  aggregate ≤ 100 GiB across tile-cache + tile-cache-write +
  fdr-output; thumbnail-log < 1 GiB strict 8 h-extrapolated.
- NFT-LIM-04 thermal (AC-NEW-5 PARTIAL): tier2_only; CPU/SoC p99
  ≤ T_throttle − 5 °C; throttle-event scan; PARTIAL annotation written
  to traceability-status.json. Thresholds fixture lives at
  e2e/fixtures/jetson/thermal-thresholds.json (moved from the
  task spec's suggested tests/fixtures/ path so the file stays
  inside the blackbox_tests Owns: e2e/** envelope).

All four helpers are public-boundary-only (no src/gps_denied_onboard
imports). Scenarios skip cleanly in the Tier-1 docker harness pending
AZ-595 (SITL replay builder) for the four shared fixture inputs and
AZ-444 (Tier-2 Jetson runner) for the tier2_only scenarios.

Code review: PASS_WITH_WARNINGS (0/0/2/1). Both Mediums are
carried-over write_csv_evidence + _resolve_fixture_path duplication,
deferred to AZ-446 (batch 89). Low is the self-resolved AZ-443 fixture
ownership drift documented in the review.

Tests: 1223 e2e/_unit_tests passing (+1 vs. batch 87 from the new
directory-layout entry); 24 resource_limit scenarios collect and skip
cleanly under runner/pytest.ini.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-17 18:01:55 +03:00
parent d1e30f818f
commit 6e4a575221
22 changed files with 2785 additions and 4 deletions
@@ -0,0 +1,77 @@
# NFT-LIM-01 — Jetson memory budget
**Task**: AZ-440_nft_lim_01_jetson_memory
**Name**: Steady-state memory ≤ 4.5 GB (Plan A) / ≤ 6.0 GB (Plan B); peak ≤ 5.0 GB / 6.5 GB; no OOM (AC-NEW-13)
**Description**: Implement NFT-LIM-01 — Tier-2 ONLY; 5 min Derkachi replay + 30 s warm-up; sample memory at 1 Hz from `/proc/<pid>/status` AND `tegrastats`; assert steady-state and peak budgets per Plan A / Plan B; no OOM kills observed.
**Complexity**: 3 points
**Dependencies**: AZ-406, AZ-407, AZ-444
**Component**: Blackbox Tests / Resource Limit (epic AZ-262)
**Tracker**: AZ-440
**Epic**: AZ-262 (E-BBT)
## Problem
Jetson Orin Nano Super has 8 GB RAM total; the SUT must operate within Plan A (4.5 GB) or Plan B (6.0 GB) budgets to leave headroom for OS + suite. AC-NEW-13 prescribes the budgets and Plan-A/Plan-B switching rules.
## Outcome
- pytest scenario at `e2e/tests/resource_limit/test_nft_lim_01_jetson_memory.py`. Tier-2 ONLY.
- 5 min Derkachi replay + 30 s warm-up.
- Per-second memory sample from (a) `/proc/<pid>/status` `VmRSS`, (b) `tegrastats` (system-level memory).
- Compute steady-state (post-warm-up p50) and peak (post-warm-up max).
- Assert `steady_state ≤ 4.5 GB` (Plan A default) AND `peak ≤ 5.0 GB`. Plan B (6.0 / 6.5 GB) gated behind `MEMORY_PLAN=B` flag.
- Assert no OOM kills in `dmesg` during the run.
## Scope
### Included
- `/proc/<pid>/status` + `tegrastats` sampling.
- Steady-state + peak computation.
- OOM detection in `dmesg`.
### Excluded
- FDR size budget — owned by NFT-LIM-02 (AZ-441).
- Tier-1 memory measurement — irrelevant (Docker on x86 has different budgets).
## Acceptance Criteria
**AC-1: tier guard**
Given `tier == tier1-docker`
Then the test SKIPs.
**AC-2: steady-state budget (Plan A)**
Given the post-warm-up samples
Then `p50(VmRSS) ≤ 4.5 GB` AND `p50(tegrastats system memory) ≤ 4.5 GB`.
**AC-3: peak budget (Plan A)**
Given the same samples
Then `max(VmRSS) ≤ 5.0 GB`.
**AC-4: no OOM kills**
Given `dmesg --since "<run_start>"`
Then no entries match `oom-killer` or `Killed process .*gps-denied-onboard`.
**AC-5: Plan B gated**
Given `MEMORY_PLAN=B` env flag
Then the budgets relax to `steady ≤ 6.0 GB` AND `peak ≤ 6.5 GB`.
**AC-6: parameterization**
Given conftest parameterization
Then the scenario runs per `(fc_adapter, vio_strategy)`.
## System Under Test Boundary
End-to-end on real hardware through public boundaries.
- **Allowed**: `/proc`, `tegrastats` (public OS / NVIDIA telemetry), `dmesg`.
- **Forbidden**: instrumenting SUT internal allocators.
## Constraints
- Tier-2 only.
- Plan A / Plan B switching is per the SUT's documented config; the test does NOT trigger the switch — it observes and reports the active plan.
## Document Dependencies
- `_docs/02_document/tests/resource-limit-tests.md` § NFT-LIM-01
- `_docs/02_document/tests/test-data.md` § Resource Limits (NFT-LIM-01 row)
@@ -0,0 +1,57 @@
# NFT-LIM-02 — FDR size budget
**Task**: AZ-441_nft_lim_02_fdr_size
**Name**: 8 h-extrapolated FDR size ≤ 50 GB (AC-7.3)
**Description**: Implement NFT-LIM-02 — replay 30 min Derkachi at typical fixed rates; measure FDR-archive growth (`du -sh fdr-output`); extrapolate to 8 h linearly; assert `8h_extrapolated_size ≤ 50 GB`.
**Complexity**: 2 points
**Dependencies**: AZ-406, AZ-407
**Component**: Blackbox Tests / Resource Limit (epic AZ-262)
**Tracker**: AZ-441
**Epic**: AZ-262 (E-BBT)
## Problem
FDR size budget (AC-7.3) protects the on-board storage from being filled mid-flight. A 30 min run extrapolated to 8 h is the canonical measurement.
## Outcome
- pytest scenario at `e2e/tests/resource_limit/test_nft_lim_02_fdr_size.py`. Tier-1 OR Tier-2.
- 30 min Derkachi replay (loop the 8 min flight ~4×); per-minute `du -sh fdr-output` sampling.
- Linear extrapolation to 8 h: `extrapolated_size = (size_at_30min / 30) × (8 × 60)`.
- Assert `extrapolated_size ≤ 50 GB`.
## Scope
### Included
- 30 min replay (looped 8 min × 4).
- Per-minute size sampling.
- Linear extrapolation.
### Excluded
- Storage budget for tile cache + tiles → owned by NFT-LIM-03 (AZ-442).
## Acceptance Criteria
**AC-1: 30 min replay**
Given the test runs
Then the runner loops Derkachi for 30 min wall-clock.
**AC-2: extrapolation budget**
Given per-minute samples
Then `(size_at_30min / 30) × 480 ≤ 50 GB`.
**AC-3: parameterization**
Given conftest parameterization
Then the scenario runs per `(fc_adapter, vio_strategy)`.
## System Under Test Boundary
End-to-end through public boundaries.
- **Allowed**: `du -sh` of mounted volumes.
- **Forbidden**: importing FDR writer state.
## Document Dependencies
- `_docs/02_document/tests/resource-limit-tests.md` § NFT-LIM-02
- `_docs/02_document/tests/test-data.md` § Resource Limits (NFT-LIM-02 row)
@@ -0,0 +1,55 @@
# NFT-LIM-03 + NFT-LIM-05 — Aggregate storage budget + thumbnail-log budget
**Task**: AZ-442_nft_lim_03_05_storage_budget
**Name**: Aggregate on-disk storage + thumbnail-log budget (AC-7.4 / AC-NEW-12 / RESTRICT-STORAGE)
**Description**: Combined coverage for two storage-budget scenarios that share the same volume measurement: NFT-LIM-03 (aggregate `tile-cache + tile-cache-write + fdr-output ≤ 100 GB`) and NFT-LIM-05 (thumbnail-log size component < 1 GB / 8 h).
**Complexity**: 2 points
**Dependencies**: AZ-406, AZ-407
**Component**: Blackbox Tests / Resource Limit (epic AZ-262)
**Tracker**: AZ-442
**Epic**: AZ-262 (E-BBT)
## Problem
The aggregate storage budget bounds how much disk the SUT may consume on the companion. Two related scenarios share this measurement and are combined to avoid duplication.
## Outcome
- pytest scenario at `e2e/tests/resource_limit/test_nft_lim_03_05_storage_budget.py`. Tier-1 OR Tier-2.
- 30 min Derkachi replay; per-minute `du -sh` of `tile-cache`, `tile-cache-write`, `fdr-output`, and the thumbnail-log subdirectory.
- NFT-LIM-03: assert `aggregate(tile-cache + tile-cache-write + fdr-output) ≤ 100 GB`.
- NFT-LIM-05: extrapolate thumbnail-log subdirectory to 8 h; assert `extrapolated_thumbnail_log < 1 GB`.
## Scope
### Included
- 30 min replay (shareable with NFT-LIM-02 if combined in CI orchestration).
- Per-volume size sampling.
- Both budget assertions.
### Excluded
- FDR-only size — owned by NFT-LIM-02.
- Mid-flight tile generation rate — owned by FT-P-17 (AZ-422).
## Acceptance Criteria
**AC-1: aggregate budget**
Given a 30 min replay
Then `du -sh tile-cache + tile-cache-write + fdr-output ≤ 100 GB` at the end of the run.
**AC-2: thumbnail-log 8 h budget**
Given the per-minute samples of the thumbnail-log subdirectory
Then `(size_at_30min_thumb / 30) × 480 < 1 GB`.
**AC-3: parameterization**
Given conftest parameterization
Then the scenario runs per `(fc_adapter, vio_strategy)`.
## System Under Test Boundary
Same as NFT-LIM-02.
## Document Dependencies
- `_docs/02_document/tests/resource-limit-tests.md` § NFT-LIM-03, § NFT-LIM-05
- `_docs/02_document/tests/test-data.md` § Resource Limits
@@ -0,0 +1,66 @@
# NFT-LIM-04 — Thermal envelope on Jetson
**Task**: AZ-443_nft_lim_04_thermal
**Name**: Sustained thermal headroom + AC-NEW-5 PARTIAL acceptance (AC-NEW-5)
**Description**: Implement NFT-LIM-04 — Tier-2 ONLY; sustained 30 min Derkachi loop at workstation ambient; record CPU/GPU/SoC temperatures via `tegrastats`; assert no thermal throttling kicks in. Mark as PARTIAL coverage of AC-NEW-5 (the +50 °C chamber portion is a separate release-gate scenario).
**Complexity**: 2 points
**Dependencies**: AZ-406, AZ-407, AZ-444
**Component**: Blackbox Tests / Resource Limit (epic AZ-262)
**Tracker**: AZ-443
**Epic**: AZ-262 (E-BBT)
## Problem
Thermal behavior in workstation ambient is a partial coverage of AC-NEW-5 — it cannot prove the +50 °C envelope but it can flag obvious thermal regressions.
## Outcome
- pytest scenario at `e2e/tests/resource_limit/test_nft_lim_04_thermal.py`. Tier-2 ONLY.
- 30 min Derkachi loop; per-second `tegrastats` capture; assert no thermal-throttling event in `dmesg` AND p99 `cpu_temp ≤ T_throttle - 5 °C` (5 °C headroom).
- Annotate `traceability-matrix.md` AC-NEW-5 status as PARTIAL (chamber required for full).
## Scope
### Included
- 30 min loop.
- `tegrastats` + `dmesg` capture.
- PARTIAL annotation in evidence bundle.
### Excluded
- +50 °C chamber portion — owned by a separate release-gate scenario, not in this CI scope.
## Acceptance Criteria
**AC-1: tier guard**
Given `tier == tier1-docker`
Then the test SKIPs.
**AC-2: no thermal throttle**
Given the 30 min loop
Then `dmesg --since "<run_start>"` shows no entries matching `thermal_throttle` / `tegra_thermal_zone`.
**AC-3: 5 °C headroom**
Given the per-second `tegrastats` samples
Then `p99(cpu_temp) ≤ T_throttle - 5 °C` AND `p99(soc_temp) ≤ T_throttle - 5 °C`. T_throttle is the hardware-documented value (97 °C for CPU, 95 °C for SoC on Orin Nano per nVidia docs; sourced from a fixture file at runtime).
**AC-4: PARTIAL annotation**
Given the test completes
Then the evidence bundle includes a `traceability-status.json` entry `"AC-NEW-5": "PARTIAL — chamber required for full"`.
**AC-5: parameterization**
Given conftest parameterization
Then the scenario runs per `(fc_adapter, vio_strategy)`.
## System Under Test Boundary
Same as NFT-LIM-01.
## Constraints
- Tier-2 only.
- T_throttle is read from a fixture file (`e2e/fixtures/jetson/thermal-thresholds.json`) so future Jetson hardware updates require only a fixture bump. (Implementation relocated from the task spec's original `tests/fixtures/` suggestion to `e2e/fixtures/` so the fixture lives inside the `blackbox_tests` `Owns: e2e/**` envelope per `_docs/02_document/module-layout.md`.)
## Document Dependencies
- `_docs/02_document/tests/resource-limit-tests.md` § NFT-LIM-04
- `_docs/02_document/tests/traceability-matrix.md` § AC-NEW-5 (PARTIAL annotation)