Files
gps-denied-onboard/_docs/02_document/tests/resource-limit-tests.md
T

178 lines
7.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Resource Limit Tests
> All tests measure resources via the `prom` (Prometheus) and `nvidia-smi-exporter` services defined in `environment.md`. None of these tests touch SUT internals.
---
### NFT-RES-LIM-01: Memory ≤8 GB shared (AC-4.2)
**Summary**: Peak resident memory + GPU memory remains under the 8 GB shared LPDDR5 cap.
**Traces to**: AC-4.2, results_report row 35, NF-T2. Tier: T1 (Docker mem accounting) + T4 (`tegrastats`).
**Preconditions**: 30-min sustained replay on Orin Nano Super 25 W (T4) or 30-min replay on x86+CUDA emulation (T1 functional only).
**Monitoring**:
- `prom` scrapes the SUT's `/metrics` endpoint for `process_resident_memory_bytes`.
- `nvidia-smi-exporter` (T4) scrapes Jetson `tegrastats` for shared-LPDDR5 usage.
**Duration**: 30 min replay.
**Pass criteria**:
- T4 binding: peak shared LPDDR5 usage < 8192 MB throughout; growth ≤ 50 MB over the 30-min window (no leak).
- T1 functional: peak resident memory < 8192 MB; growth ≤ 50 MB.
---
### NFT-RES-LIM-02: Thermal — junction temperature ≤80 °C, no throttle (results_report row 36)
**Summary**: SoC junction temperature stays below 80 °C; no thermal throttle event.
**Traces to**: results_report row 36, AC-NEW-5 (sub-budget). Tier: T4.
**Preconditions**: T4 only; +25 °C ambient.
**Monitoring**: `nvidia-smi-exporter` reads junction temp every 1 s.
**Duration**: 30 min replay.
**Pass criteria**: max(junction_temp_c) ≤ 80 °C; throttle_event_count == 0 (per `tegrastats throttle` indicator).
---
### NFT-RES-LIM-03: AC-NEW-5 thermal envelope — 8 h @ 25 W @ +50 °C ambient
**Summary**: Cooling solution sustains 25 W for 8 h at +50 °C ambient without thermal throttling.
**Traces to**: AC-NEW-5, NF-T3, restriction §Onboard Hardware. Tier: T4 (`deferred-hil`) — requires hot-soak chamber.
**Preconditions**: hot-soak chamber, +50 °C ambient stabilized; SUT in 25 W mode running `synthetic_8h_load`.
**Monitoring**: junction temp + throttle indicator via `tegrastats`; ambient temp probe; FDR thermal log (AC-NEW-3 includes thermal traces).
**Duration**: 8 h.
**Pass criteria**: throttle_event_count == 0 over 8 h; throttle event automatically emits STATUSTEXT to GCS if it occurs (verify behaviour with a deliberate throttle injection in a separate run).
---
### NFT-RES-LIM-04: AC-NEW-5 cold-soak cold-start
**Summary**: Cold-start TTFF at 20 °C ambient meets AC-NEW-1 budget.
**Traces to**: AC-NEW-5 cold corner, AC-NEW-1, NF-T3 cold-soak. Tier: T4 (`deferred-hil`) — requires cold chamber.
**Preconditions**: chamber stabilized at 20 °C with SUT powered off; nav-cam + IMU sources cold-replay-ready.
**Monitoring**: TTFF timer (per FT-P-16 / FT-P-T4 cold).
**Duration**: 50 cold boots within the cold chamber.
**Pass criteria**: 95th percentile TTFF ≤ 30 s.
---
### NFT-RES-LIM-05: FDR — 8-h cap + rollover (AC-NEW-3, NF-T5)
**Summary**: After 8 h replay, FDR is ≤ 64 GB and no payload class silently dropped.
**Traces to**: AC-NEW-3, AC-8.5, NF-T5. Tier: T1 (volume-size accounting) + T4 (real disk).
**Preconditions**: clean `fdr` volume at start; `synthetic_8h_load` replay.
**Monitoring**: filesystem accounting per directory class; FDR rollover log (must record every dropped segment).
**Duration**: 8 h.
**Pass criteria**:
- Total FDR ≤ 64 GB.
- All payload classes present in the latest segment: per-frame positions w/ covariance + source-label, FC IMU full-rate, GPS_INPUT frames, MAVLink raw stream (tlog), system health (CPU / GPU / temp / throttle), mid-flight tiles, ≤0.1 Hz failure-thumbnail log.
- For each rollover, a STATUSTEXT or rollover log entry exists; no silent drop.
- Raw nav-cam / AI-cam frames are NOT present (AC-8.5 cross-check).
---
### NFT-RES-LIM-06: Tile cache ≤ 10 GB persistent (restrictions §UAV)
**Summary**: Persistent satellite-tile cache for the 400 km² operational area + onboard-generated tiles fits in 10 GB.
**Traces to**: restrictions §UAV ("~10 GB" tile-cache budget). Tier: T1.
**Preconditions**: simulate 400 km² operational area (satellite tiles + DEM tiles + VPR chunk index) loaded; run a flight that generates onboard tiles; let cache settle.
**Monitoring**: filesystem size of `/probe/tiles/`.
**Duration**: 30 min replay (enough to populate onboard tiles).
**Pass criteria**: total cache size ≤ 10 GB after the flight; deduplication keeps onboard tiles per sector ≤ 1.
---
### NFT-RES-LIM-07: GPU memory peak
**Summary**: TensorRT engines (cuVSLAM + matcher + VPR) collectively fit within Orin Nano Super shared LPDDR5 with headroom for the rest of the system.
**Traces to**: AC-4.2, NF-T2 (extended for ROS 2 image growth). Tier: T4.
**Preconditions**: all TRT engines loaded.
**Monitoring**: `tegrastats` GPU memory line.
**Duration**: steady-state 5 min after warm-up.
**Pass criteria**: GPU memory ≤ 4 GB (leaves ≥ 4 GB for ROS 2 nodes + working set + OS); engine reservation ≥ 1 GB for matcher + VPR (per NF-T2 extended).
---
### NFT-RES-LIM-08: Per-frame GPU latency budget breakdown
**Summary**: Sum of (cuVSLAM + matcher + VPR + Component 5 calibrator + Component 1b ortho) ≤ 400 ms p95 per AC-4.1.
**Traces to**: AC-4.1, NFT-PERF-01..04. Tier: T4.
**Monitoring**: per-stage timers exposed via `/metrics`.
**Duration**: 30 min replay.
**Pass criteria**: Σ p95(per-stage) ≤ 400 ms; each component within its sub-budget (cuVSLAM ≤ 20, matcher inline ≤ 200, ortho ≤ 50, VPR conditional ≤ 200 only on triggers, calibrator ≤ 5).
---
### NFT-RES-LIM-09: ROS 2 + Isaac ROS image footprint
**Summary**: Deployment image fits the documented ~200 MB growth budget over the DIY-Python baseline.
**Traces to**: M-29 cost / benefit, NF-T2 extended. Tier: T1 (image inspection).
**Steps**: build the deployment image; compare against a baseline DIY-Python image manifest; assert delta ≤ 200 MB.
**Pass criteria**: delta ≤ 200 MB; matcher + VPR engine reservation ≥ 1 GB available at runtime.
---
### NFT-RES-LIM-10: CPU usage — DDS overhead bound
**Summary**: ROS 2 DDS + topic serialisation overhead stays within the documented 25 % CPU.
**Traces to**: M-29 (Q6 → A cost / benefit). Tier: T4.
**Monitoring**: per-process CPU via `prom`; DDS process / `rmw_*` thread CPU specifically.
**Duration**: 30 min replay.
**Pass criteria**: DDS CPU mean ≤ 5 %; total SUT CPU ≤ 80 % to leave headroom for spikes.
---
### NFT-RES-LIM-11: Operational area ≤ 400 km² and 8-h flight cap
**Summary**: SUT correctly handles the documented operational ceiling (sector 150 km² + corridor 50 km² ≈ 200 km² typical, up to 400 km² total).
**Traces to**: restrictions §UAV. Tier: T1 (smoke + audit).
**Steps**: configure SUT with a 400 km² operational area; verify boot-time pre-allocation respects budget; run a synthetic flight at 60 km/h cruise for 30 min (representative of 8 h scaled).
**Pass criteria**: SUT loads tile descriptors + VPR index without OOM; 30 min replay sustained at expected fps; resource budgets (NFT-RES-LIM-01..10) all green at this scale.
---
### NFT-RES-LIM-12: Disk I/O — FDR write rate sustainable
**Summary**: FDR write rate sustained over 8 h does not back up the writer or interfere with the inline pipeline.
**Traces to**: AC-NEW-3, AC-4.1 (no interference). Tier: T4.
**Monitoring**: NVMe write throughput (MB/s) via Prometheus + I/O wait via `vmstat`.
**Duration**: 8 h.
**Pass criteria**: write rate ≤ NVMe sustained throughput minus 30 % headroom; I/O wait does not contribute to AC-4.1 latency violations (NFT-PERF-01 still passes during the 8-h window).