# Resource Limit Tests Authored by `/test-spec` Phase 2 (2026-05-19). Resource-limit tests assert that the SUT stays within a quantified resource ceiling for the configured duration. Short bursts do not satisfy these tests — every scenario has an explicit sustained-monitoring window. --- ### NFT-RES-LIM-Re1: Combined onboard RSS ≤ 6 GB sustained **Summary**: Combined process RSS on the deployed compute device for everything autopilot owns onboard (excluding Tier 1) MUST stay ≤ 6 GB throughout a 5-minute steady-state window with the full onboard workload active. **Traces to**: AC `Resources & Data — Combined RSS on the deployed compute device, for everything autopilot owns onboard (excluding Tier 1), MUST stay within ≤ 6 GB / Re1`, RESTRICT `Hardware — Compute device: Jetson Orin Nano Super, 8 GB shared LPDDR5; Tier 1 consumes ~2 GB, leaving ~6 GB for autopilot`. **Tier**: HW (representative Jetson Orin Nano Super) — pure-x86 reports informational only and does NOT satisfy the project-level Acceptance Gate. **Preconditions**: - Full onboard workload active: frame ingest from `rtsp-loopback`, Tier-2 + Tier-3 (when enabled) inferring at the documented steady-state load, gimbal commands flowing, MAVLink stream consumed at 10 Hz, operator-stream connected, MapObjects store hydrated for a 30×30 km region. - Warm-up: 60 s before measurement starts (any first-load model warm-up complete). - Tier-1 process is RUNNING in parallel but its RSS is EXCLUDED from the measurement (the AC scope is autopilot-owned RSS, excluding Tier 1). **Monitoring**: - Cgroup-level RSS for every process the SUT owns (the SUT binary plus any child processes it spawns — e.g., the VLM IPC peer if it lives in autopilot's cgroup), sampled at 1 Hz. - Cgroup-level RSS for Tier 1 sampled at the same cadence (for the Re2 cross-reference). - Per-process RSS captured to `reports//rss-trace.csv` for forensic review on failure. **Duration**: 5 minutes of measurement after warm-up. **Pass criteria**: - `threshold_max`: per 1 s sample, `sum(autopilot_owned_RSS) ≤ 6 GB`. - No single 1 s sample exceeds the ceiling. - (Reporting only — not pass/fail): peak RSS, mean RSS, P95 RSS recorded in the CSV report. **Test status**: DEFERRED — ``. --- ### NFT-RES-LIM-Re2: Tier-1 non-degradation under autopilot workload **Summary**: When autopilot's full onboard workload runs concurrently with Tier 1 on the same Jetson, Tier-1 per-frame latency MUST NOT degrade by more than ± 5 ms versus the Tier-1-alone baseline (recorded by NFT-PERF-L1). **Traces to**: AC `Resources & Data — Tier 1 per-frame latency MUST NOT degrade by more than ± 5 ms when autopilot's own onboard workload is running concurrently / Re2`, RESTRICT `Tier 1 (YOLO) and any local large model with GPU memory pressure share the Jetson GPU — only one of them may execute at any wall-clock instant`. **Tier**: HW (the only meaningful environment for this assertion — GPU contention behaviour does not reproduce on x86). **Preconditions**: - NFT-PERF-L1 has been run on the same HW configuration in the SAME session and a baseline `tier1_baseline_p95_ms` recorded. - Full onboard workload active (same as Re1). **Monitoring**: - Tier-1 per-frame latency sampled per frame for the duration of the test. - The same metric source as NFT-PERF-L1 — for direct delta comparison. **Duration**: 5 minutes of measurement after warm-up (matches Re1 window so both can run in the same session). **Pass criteria**: - `numeric_tolerance`: `|p95(tier1_with_autopilot) - tier1_baseline_p95_ms| ≤ 5 ms`. - (Reporting only): mean, P95, max delta over the window. **Test status**: DEFERRED — same fixture dependency as Re1; requires SUT operational + Tier 1 colocated on HW. --- ### NFT-RES-LIM-Storage: On-device persistent store stays under 95 % for in-flight operation **Summary**: During a steady-state mission run (no abnormal load), the on-device persistent store MUST NOT exceed 95 % full. This protects the takeoff gate (R3) from being silently violated mid-mission and protects the post-flight push (Mp4) from running out of room to persist a failed diff. **Traces to**: AC `Reliability & Safety — On-device storage MUST be bounded` (via R3 BIT gate), RESTRICT `On-device storage MUST be bounded`. **Tier**: B + HW. **Preconditions**: - SUT mid-flight; persistent store at typical post-takeoff utilisation (e.g. 30 %). - Normal-operation event volume: telemetry persistence, ignored-item appends, pending map-diff buffer (empty in this scenario). **Monitoring**: - Volume utilisation sampled at 10 Hz throughout the duration. **Duration**: 60 minutes (representative mission duration per Mp3). **Pass criteria**: - `threshold_max`: `volume_used / volume_total ≤ 0.95` at every sample point. - On approach to 85 %: structured-log INFO `storage_pressure` with current utilisation. - On approach to 90 %: structured-log WARN with current utilisation; health.storage transitions to yellow. - On 95 %: health.storage transitions to red; the SUT begins its documented eviction policy (this scenario does NOT test the policy semantics — that belongs to its own scenario; this scenario only asserts the policy IS triggered). **Test status**: READY (no external fixture beyond the SUT itself; the persistent-store seed file controls starting utilisation). --- ### NFT-RES-LIM-CPU: CPU headroom for the Tier-1 colocation guarantee **Summary**: Combined CPU utilisation of every autopilot-owned process MUST leave enough Jetson CPU headroom for Tier 1 to keep its NFT-PERF-L1 budget. Concretely: per-second sustained CPU usage by autopilot-owned processes MUST stay ≤ the configured budget (default 60 % of total CPU cycles measured at the cgroup level) for the duration of the run. **Traces to**: AC `Resources & Data — Tier 1 per-frame latency MUST NOT degrade by more than ± 5 ms / Re2` (CPU-side mechanism backing Re2), RESTRICT `Hardware — Jetson Orin Nano Super`. **Tier**: HW (CPU contention does not reproduce on x86). **Preconditions**: - Same workload as Re1 + Re2. **Monitoring**: - Cgroup CPU usage at 1 Hz. **Duration**: 5 minutes after warm-up. **Pass criteria**: - `threshold_max`: per 1 s sample, `sum(autopilot_cpu_usage) ≤ 60 %` of total CPU. - Reporting: mean, P95, max. **Test status**: DEFERRED — same dependency as Re1/Re2. --- ### NFT-RES-LIM-GPU: GPU mutual exclusion contract (Tier 1 vs local large model) **Summary**: Per RESTRICT (`Tier 1 (YOLO) and any local large model with GPU memory pressure share the Jetson GPU — only one of them may execute at any wall-clock instant`), the SUT MUST NOT issue a GPU compute call (e.g. Tier-3 VLM inference) while Tier 1 is executing on the GPU. The serialisation MUST be observable: a single GPU is busy at one instant. **Traces to**: RESTRICT `Tier 1 and any local large model … only one of them may execute at any wall-clock instant`. **Tier**: HW. **Preconditions**: - Tier 1 active; SUT in a ZoomedIn hold with deep-analysis enabled (Tier-3 will fire). **Monitoring**: - GPU-instance occupancy via `tegrastats` / equivalent at the highest available sampling rate. - The SUT's own internal "compute-class" telemetry exposed on the health endpoint as `gpu_owner_current` ∈ { "tier1", "tier3", "idle" }. **Duration**: 60 s containing ≥ 5 Tier-3 hold cycles. **Pass criteria**: - `exact`: at every sample point, `gpu_owner_current ∈ { "tier1", "tier3", "idle" }`; never simultaneously both. - `tegrastats` peak GPU occupancy attributable to autopilot processes never overlaps Tier 1's known activity window for the same wall-clock instant. **Test status**: DEFERRED — depends on the SUT being operational end-to-end + Tier-3 enabled; also depends on the SUT exposing `gpu_owner_current` (which is an architectural choice not yet locked). --- ### NFT-RES-LIM-FileHandles: File-descriptor and socket bound **Summary**: Sustained operation MUST NOT leak file descriptors or sockets. The count MUST stay within a documented headroom of the initial-post-warmup baseline for the duration of the run. **Traces to**: RESTRICT `On-device storage MUST be bounded` (general bounded-resource principle), security principle `No silent error swallowing for security-relevant failures` (FD exhaustion would silently break the operator-stream). **Tier**: B + HW. **Preconditions**: - Warm-up: 60 s. - Workload: full onboard workload at steady state. **Monitoring**: - `/proc//fd` count per autopilot process at 1 Hz. **Duration**: 60 minutes. **Pass criteria**: - `threshold_max`: at every sample point, `fd_count ≤ fd_baseline_post_warmup + 50` (50 = documented churn headroom for intermittent operator reconnects). - A monotonically rising trend (slope > 0 over the run) is a TEST FAILURE even if the absolute ceiling is not breached. **Test status**: READY for a Tier-B run; gains its real value once HW + sustained-workload land. --- ## Common assertions for every resource-limit scenario - **Sustained-monitoring is non-negotiable.** Each scenario specifies a duration ≥ 60 s; short bursts that pass do not satisfy the test. The CSV report records the full sample trace path under `artifacts_path`. - **No silent eviction.** Where a ceiling is approached, the SUT MUST surface the pressure (structured-log INFO at 85 %, WARN at 90 %, transition to yellow/red on health) BEFORE reaching the ceiling. A pass with no observable pressure signal at thresholds is a TEST FAILURE. - **HW reporting vs gating.** Pure-x86 runs report informational deltas only; they do NOT satisfy the project-level Acceptance Gate. Every CSV row records its tier so this distinction stays auditable. - **Re1 + Re2 are paired.** Re1 establishes the autopilot RSS ceiling; Re2 establishes that respecting Re1 does not cost Tier 1 latency. They MUST be run in the same session to make the Re2 baseline meaningful.